Re: Who signed gemu-1.7.1.tar.bz2?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/22/14 07:35, Michael Roth wrote: > Quoting Stefan Hajnoczi (2014-04-22 08:31:08) >> On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote: >>> and where is their gpg key? >> >> Michael Roth is doing releases: >> >> http://pgp.mit.edu/pks/lookup?op=vindex&search=0x3353C9CEF108B584 >> >> >> $ gpg --verify qemu-2.0.0.tar.bz2.sig >> gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA >> key ID F108B584 gpg: Good signature from "Michael Roth >> " gpg: aka "Michael Roth >> " gpg: aka "Michael Roth >> " > > Missed the context, but if this is specifically about 1.7.1: > > 1.7.1 was prior to me handling the release tarballs, Anthony > actually did the signing and uploading for that one. I'm a bit > confused though, as the key ID on that tarball is: > > mdroth@loki:~/Downloads$ gpg --verify qemu-1.7.1.tar.bz2.sig gpg: > Signature made Tue 25 Mar 2014 09:03:24 AM CDT using RSA key ID > ADF0D2D9 gpg: Can't check signature: public key not found > > I can't seem to locate ADF0D2D9 though: > > http://pgp.mit.edu/pks/lookup?search=0xADF0D2D9&op=vindex > > Anthony's normal key (for 1.6.0 and 1.7.0 at least) was 7C18C076: > > http://pgp.mit.edu/pks/lookup?search=0x7C18C076&op=vindex > > I think maybe Anthony might've signed it with a separate local > key? Yeah, I accidentally signed it with the wrong key. Replacing the signature doesn't seem like the right thing to do since release artifacts should never change. Regards, Anthony Liguori >> >> Stefan > -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJTV8NqAAoJEBqtxxBWguX/j9oH/3eVb+PgcXhEHICRXNoPyNy8 wiMeNABsTh7xn/wYpUHBxIa0lWWeO/W/6ZFLhfL50C8Nm8fsldEASOB6jngcK1dZ 5jAexApGeN5Q10Bi+reum7/bqCgxaHRmXEO/wyJtlOiC/fxsbdupg04Zk6dO2b5h gRHxkt8uC2DWRJjb8fReR1K96aTPm9SI9GRrNZ9pAHrT6MeF3FOQGkY0hhpPDE6k YPXb8keAlldT0U9h/Du+8m7mMCKMvwa3rRMNSw+lw7Oc5eMRwQzxUB+B4jEJ9f1k +bL7opOcYNgqBxhKzAFgmMqlnwvM55CsWiPRq5L0/68w8qxWRQl+ECPfpJ1O0ac= =/bg9 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2014-04-01
On Mon, Mar 31, 2014 at 7:46 AM, Andreas Färber wrote: > Am 31.03.2014 16:32, schrieb Peter Maydell: >> On 31 March 2014 15:28, Paolo Bonzini wrote: >>> I think it would be a good idea to separate the committer and release >>> manager roles. Peter is providing the community with a wonderful service, >>> just like you were; putting too much work on his shoulders risks getting us >>> in the same situation if anything were to affect his ability to provide it. >> >> Yes, I strongly agree with this. I think we'll do much better >> if we can manage to share out responsibilities among a wider >> group of people. > > May I propose Michael Roth, who is already experienced from the N-1 > stable releases? > > If we can enable him to upload the tarballs created from his tags that > would also streamline the stable workflow while at it. If mdroth is willing to take this on, I am very supportive. Regards, Anthony Liguori > > Regards, > Andreas > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2014-04-01
On Mon, Mar 31, 2014 at 6:25 AM, Peter Maydell wrote: > On 31 March 2014 14:21, Christian Borntraeger wrote: >> Another thing might be the release process in general. Currently it seems >> that everybody tries to push everything just before the hard freeze. I had >> to debug some problems introduced _after_ soft freeze. Is there some >> interest in having a Linux-like process (merge window + stabilization)? This >> would require shorter release cycles of course. > > "merge window" has been suggested before. I think it would be > a terrible idea for QEMU, personally. We're not the kernel in > many ways, notably dev community size and a greater tendency > to changes that have effects across the whole tree. > > Soft + hard freeze is our stabilization period currently. Peter, are you willing to do the tagging and announcement for the 2.0 rcs? I sent instructions privately and between stefanha and I we can get your permissions sorted out. Regards, Anthony Liguori > thanks > -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-12-10
On Tue, Dec 10, 2013 at 4:54 AM, Markus Armbruster wrote > Paolo Bonzini writes: > >> Il 10/12/2013 12:42, Juan Quintela ha scritto: >>> >>> Hi >>> >>> Please, send any topic that you are interested in covering. >> >> May not need a phone call, but I'll drop it here: what happened to >> acknowledgement emails from the patches script? >> >> Also, Anthony, it looks like you're still adjusting to the new job. If >> you need help with anything, I guess today's call could be a good place >> to discuss it. >> >> And someone needs to send out the email saying that 1.7.0 is out and >> that the next version will be 2.0! > > Speaking of sending out e-mail: did I miss the promised followup to the > key signing party? I need to find the papers from KVM Forum which are somewhere among the stacks of boxes here :-/ Regards, Anthony Liguori > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-12-10
On Tue, Dec 10, 2013 at 4:37 AM, Paolo Bonzini wrote: > Il 10/12/2013 12:42, Juan Quintela ha scritto: >> >> Hi >> >> Please, send any topic that you are interested in covering. > > May not need a phone call, but I'll drop it here: Could we move the time on this phone call? 7am conflicts with my daily commute. I could do 6am or 9am. I think it would be very useful to be able to attend this call. > what happened to > acknowledgement emails from the patches script? It's buggy and I haven't had a chance to rewrite it yet. > Also, Anthony, it looks like you're still adjusting to the new job. If > you need help with anything, I guess today's call could be a good place > to discuss it. > > And someone needs to send out the email saying that 1.7.0 is out and > that the next version will be 2.0! Mail is out now, sorry for the delay. Pull requests should be getting processed in a reasonable time. I am not yet spending enough time doing patch review but that should improve in the very near future. It's not so much the new job as it is relocating and moving all at the same time. I'm hoping the holiday break is a good way to catch up things. Of course, we should revisit again soon. Regards, Anthony Liguori > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Elvis upstreaming plan
Abel Gordon writes: > "Michael S. Tsirkin" wrote on 27/11/2013 12:27:19 PM: > >> >> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote: >> > Hi, >> > >> > Razya is out for a few days, so I will try to answer the questions as > well >> > as I can: >> > >> > "Michael S. Tsirkin" wrote on 26/11/2013 11:11:57 PM: >> > >> > > From: "Michael S. Tsirkin" >> > > To: Abel Gordon/Haifa/IBM@IBMIL, >> > > Cc: Anthony Liguori , abel.gor...@gmail.com, >> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/ >> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/ >> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/ >> > > Haifa/IBM@IBMIL >> > > Date: 27/11/2013 01:08 AM >> > > Subject: Re: Elvis upstreaming plan >> > > >> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote: >> > > > >> > > > >> > > > Anthony Liguori wrote on 26/11/2013 > 08:05:00 >> > PM: >> > > > >> > > > > >> > > > > Razya Ladelsky writes: >> > > > > >> > >> > > > >> > > > That's why we are proposing to implement a mechanism that will > enable >> > > > the management stack to configure 1 thread per I/O device (as it is >> > today) >> > > > or 1 thread for many I/O devices (belonging to the same VM). >> > > > >> > > > > Once you are scheduling multiple guests in a single vhost device, > you >> > > > > now create a whole new class of DoS attacks in the best case >> > scenario. >> > > > >> > > > Again, we are NOT proposing to schedule multiple guests in a single >> > > > vhost thread. We are proposing to schedule multiple devices > belonging >> > > > to the same guest in a single (or multiple) vhost thread/s. >> > > > >> > > >> > > I guess a question then becomes why have multiple devices? >> > >> > If you mean "why serve multiple devices from a single thread" the > answer is >> > that we cannot rely on the Linux scheduler which has no knowledge of > I/O >> > queues to do a decent job of scheduling I/O. The idea is to take over > the >> > I/O scheduling responsibilities from the kernel's thread scheduler with > a >> > more efficient I/O scheduler inside each vhost thread. So by combining > all >> > of the I/O devices from the same guest (disks, network cards, etc) in a >> > single I/O thread, it allows us to provide better scheduling by giving > us >> > more knowledge of the nature of the work. So now instead of relying on > the >> > linux scheduler to perform context switches between multiple vhost > threads, >> > we have a single thread context in which we can do the I/O scheduling > more >> > efficiently. We can closely monitor the performance needs of each > queue of >> > each device inside the vhost thread which gives us much more > information >> > than relying on the kernel's thread scheduler. >> > This does not expose any additional opportunities for attacks (DoS or >> > other) than are already available since all of the I/O traffic belongs > to a >> > single guest. >> > You can make the argument that with low I/O loads this mechanism may > not >> > make much difference. However when you try to maximize the utilization > of >> > your hardware (such as in a commercial scenario) this technique can > gain >> > you a large benefit. >> > >> > Regards, >> > >> > Joel Nider >> > Virtualization Research >> > IBM Research and Development >> > Haifa Research Lab >> >> So all this would sound more convincing if we had sharing between VMs. >> When it's only a single VM it's somehow less convincing, isn't it? >> Of course if we would bypass a scheduler like this it becomes harder to >> enforce cgroup limits. > > True, but here the issue becomes isolation/cgroups. We can start to show > the value for VMs that have multiple devices / queues and then we could > re-consider extending the mechanism for multiple VMs (at least as a > experimental feature). > >> But it might be easier to give scheduler the info it needs to do what we >> need. Would an API that basically says "r
Re: Elvis upstreaming plan
Razya Ladelsky writes: > Hi all, > > I am Razya Ladelsky, I work at IBM Haifa virtualization team, which > developed Elvis, presented by Abel Gordon at the last KVM forum: > ELVIS video: https://www.youtube.com/watch?v=9EyweibHfEs > ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE > > > According to the discussions that took place at the forum, upstreaming > some of the Elvis approaches seems to be a good idea, which we would like > to pursue. > > Our plan for the first patches is the following: > > 1.Shared vhost thread between mutiple devices > This patch creates a worker thread and worker queue shared across multiple > virtio devices > We would like to modify the patch posted in > https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766 > > to limit a vhost thread to serve multiple devices only if they belong to > the same VM as Paolo suggested to avoid isolation or cgroups concerns. > > Another modification is related to the creation and removal of vhost > threads, which will be discussed next. I think this is an exceptionally bad idea. We shouldn't throw away isolation without exhausting every other possibility. We've seen very positive results from adding threads. We should also look at scheduling. Once you are scheduling multiple guests in a single vhost device, you now create a whole new class of DoS attacks in the best case scenario. > 2. Sysfs mechanism to add and remove vhost threads > This patch allows us to add and remove vhost threads dynamically. > > A simpler way to control the creation of vhost threads is statically > determining the maximum number of virtio devices per worker via a kernel > module parameter (which is the way the previously mentioned patch is > currently implemented) > > I'd like to ask for advice here about the more preferable way to go: > Although having the sysfs mechanism provides more flexibility, it may be a > good idea to start with a simple static parameter, and have the first > patches as simple as possible. What do you think? > > 3.Add virtqueue polling mode to vhost > Have the vhost thread poll the virtqueues with high I/O rate for new > buffers , and avoid asking the guest to kick us. > https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0 Ack on this. Regards, Anthony Liguori > 4. vhost statistics > This patch introduces a set of statistics to monitor different performance > metrics of vhost and our polling and I/O scheduling mechanisms. The > statistics are exposed using debugfs and can be easily displayed with a > Python script (vhost_stat, based on the old kvm_stats) > https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0 > > > 5. Add heuristics to improve I/O scheduling > This patch enhances the round-robin mechanism with a set of heuristics to > decide when to leave a virtqueue and proceed to the next. > https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d > > This patch improves the handling of the requests by the vhost thread, but > could perhaps be delayed to a > later time , and not submitted as one of the first Elvis patches. > I'd love to hear some comments about whether this patch needs to be part > of the first submission. > > Any other feedback on this plan will be appreciated, > Thank you, > Razya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()
On Tue, Nov 12, 2013 at 8:08 AM, Peter Maydell wrote: > On 12 November 2013 15:58, Paolo Bonzini wrote: >> I don't really see a reason why QEMU should give clang more weight than >> Windows or Mac OS X. > > I'm not asking for more weight (and actually my main > reason for caring about clang is exactly MacOSX). I'm > just asking that when a bug is reported whose underlying > cause is "we don't work on clang because we're relying on > undocumented behaviour of gcc" with an attached patch that > fixes this by not relying on the undocumented behaviour, > that we apply the patch rather than saying "why do we > care about clang"... QEMU has always been intimately tied to GCC. Heck, it all started as a giant GCC hack relying on entirely undocumented behavior (dyngen's disassembly of functions). There's nothing intrinsically bad about being tied to GCC. If you were making argument that we could do it a different way and the result would be as nice or nicer, then it wouldn't be a discussion. But if supporting clang means we have to remove useful things, then it's always going to be an uphill battle. In this case, the whole discussion is a bit silly. Have you actually tried -O1 under a debugger with clang? Is it noticably worse than -O0? I find QEMU extremely difficult to use an interactive debugger on anyway. I doubt the difference between -O0 and -O1 is even close to the breaking point between usability under a debugger... Regards, Anthony Liguori > This seems to me to be a win-win situation: > * we improve our code by not relying on undocumented >implentation specifics > * we work on a platform that, while not a primary >platform, is at least supported in the codebase and >has people who fix it when it breaks > > -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()
On Mon, Nov 11, 2013 at 3:11 PM, Paolo Bonzini wrote: > Il 11/11/2013 23:38, Peter Maydell ha scritto: >> If we have other places where we're relying on dead code elimination >> to not provide a function definition, please point them out, because >> they're bugs we need to fix, ideally before they cause compilation >> failures. > > I'm not sure, there are probably a few others. Linux also relies on the > idiom (at least KVM does on x86). And they are there because it's a useful tool. >> Huh? The point of stub functions is to provide versions of functions >> which either need to return an "always fails" code, or which will never >> be called, but in either case this is so we can avoid peppering the >> code with #ifdefs. The latter category is why we have stubs which >> do nothing but call abort(). > > There are very few stubs that call abort(): > > int kvm_cpu_exec(CPUState *cpu) > { > abort(); > } > > int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset) > { > abort(); > } > > Calling abort() would be marginally better than returning 0, but why > defer checks to runtime when you can let the linker do them? Exactly. >>>> I wouldn't be surprised if this also affected debug gcc >>>> builds with KVM disabled, but I haven't checked. >>> >>> No, it doesn't affect GCC. See Andreas's bug report. Is it a bug or a >>> feature? Having some kind of -O0 dead-code elimination is definitely a >>> feature (http://gcc.gnu.org/ml/gcc-patches/2003-03/msg02443.html). >> >> That patch says it is to "speed up these RTL optimizers and by allocating >> less memory, reduce the compiler footprint and possible memory >> fragmentation". So they might investigate it as a performance >> regression, but it's only a "make compilation faster" feature, not >> correctness. Code which relies on dead-code-elimination is broken. > > There's plenty of tests in the GCC testsuite that rely on DCE to test > that an optimization happened; some of them at -O0 too. So it's become > a GCC feature in the end. > > Code which relies on dead-code-elimination is not broken, it's relying > on the full power of the toolchain to ensure bugs are detected as soon > as possible, i.e. at build time. > >>> I am okay with Andreas's patch of course, but it would also be fine with >>> me to split the "if" in two, each with its own separate break statement. >> >> I think Andreas's patch is a bad idea and am against it being >> applied. It's very obviously a random tweak aimed at a specific >> compiler's implementation of dead-code elimination, and it's the >> wrong way to fix the problem. > > It's very obviously a random tweak aimed at a specific compiler's bug in > dead-code elimination, I'm not denying that. But the same compiler > feature is being exploited elsewhere. We're not talking about something obscure here. It's eliminating an if(0) block. There's no reason to leave an if (0) block around. The code is never reachable. >>> Since it only affects debug builds, there is no hurry to fix this in 1.7 >>> if the approach cannot be agreed with. >> >> ?? Debug builds should absolutely work out of the box -- if >> debug build fails that is IMHO a release critical bug. > > Debug builds for qemu-system-{i386,x86_64} with clang on systems other > than x86/Linux. Honestly, it's hard to treat clang as a first class target. We don't have much infrastructure around so it's not getting that much testing. We really need to figure out how we're going to do CI. FWIW, I'd rather just add -O1 for debug builds than add more stub functions. Regards, Anthony Liguori > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 0/6] VFIO updates for QEMU
Alex Williamson writes: > The following changes since commit a684f3cf9b9b9c3cb82be87aafc463de8974610c: > > Merge remote-tracking branch 'kraxel/seabios-1.7.3.2' into staging > (2013-09-30 17:15:27 -0500) > > are available in the git repository at: > > > git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-20131003.0 > > for you to fetch changes up to 1d5bf692e55ae22b59083741d521e27db704846d: > > vfio: Fix debug output for int128 values (2013-10-03 09:10:09 -0600) > > Judging from the review comments, I think this needs a v2. Regards, Anthony Liguori > vfio-pci updates include: > - Forgotten MSI affinity patch posted several months ago > - Lazy option ROM loading to delay load until after device/bus resets > - Error reporting cleanups > - PCI hot reset support introduced with Linux v3.12 development kernels > - Debug build fix for int128 > > The lazy ROM loading and hot reset should help VGA assignment as we can > now do a bus reset when there are multiple devices on the bus, ex. > multi-function graphics and audio cards. The known remaining part for > VGA is the KVM-VFIO device and matching QEMU support to properly handle > devices that make use of No-Snoop transactions, particularly on Intel > host systems. > > > Alex Williamson (5): > vfio-pci: Add support for MSI affinity > vfio-pci: Test device reset capabilities > vfio-pci: Lazy PCI option ROM loading > vfio-pci: Cleanup error_reports > vfio-pci: Implement PCI hot reset > > Alexey Kardashevskiy (1): > vfio: Fix debug output for int128 values > > hw/misc/vfio.c | 621 > +++-- > 1 file changed, 512 insertions(+), 109 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is fallback vhost_net to qemu for live migrate available?
Hi Qin, On Mon, Aug 26, 2013 at 10:32 PM, Qin Chuanyu wrote: > Hi all > > I am participating in a project which try to port vhost_net on Xen。 Neat! > By change the memory copy and notify mechanism ,currently virtio-net with > vhost_net could run on Xen with good performance。 I think the key in doing this would be to implement a property ioeventfd and irqfd interface in the driver domain kernel. Just hacking vhost_net with Xen specific knowledge would be pretty nasty IMHO. Did you modify the front end driver to do grant table mapping or is this all being done by mapping the domain's memory? > TCP receive throughput of > single vnic from 2.77Gbps up to 6Gps。In VM receive side,I instead grant_copy > with grant_map + memcopy,it efficiently reduce the cost of grant_table > spin_lock of dom0,So the hole server TCP performance from 5.33Gps up to > 9.5Gps。 > > Now I am consider the live migrate of vhost_net on Xen,vhost_net use > vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the hole > memory of VM,So I am trying to fallback datapath from vhost_net to qemu when > doing live migrate ,and fallback datapath from qemu to > vhost_net again after vm migrate to new server。 KVM and Xen represent memory in a very different way. KVM can only track when guest mode code dirties memory. It relies on QEMU to track when guest memory is dirtied by QEMU. Since vhost is running outside of QEMU, vhost also needs to tell QEMU when it has dirtied memory. I don't think this is a problem with Xen though. I believe (although could be wrong) that Xen is able to track when either the domain or dom0 dirties memory. So I think you can simply ignore the dirty logging with vhost and it should Just Work. > > My question is: > why didn't vhost_net do the same fallback operation for live migrate > on KVM,but use vhost_log to mark the dirty page? > Is there any mechanism fault for the idea of fallback datapath from > vhost_net to qemu for live migrate? No, we don't have a mechanism to fallback to QEMU for the datapath. It would be possible but I think it's a bad idea to mix and match the two. Regards, Anthony Liguori > any question about the detail of vhost_net on Xen is welcome。 > > Thanks > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Are there plans to achieve ram live Snapshot feature?
Chijianchun writes: > Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly > restrictions to users. > > Are there plans to achieve ram live Snapshot feature? I think you mean a live version of the savevm command. You can approximate live migrating to a file, creating an external disk snapshot, then resuming the guest. Regards, Anthony Liguori > > in my mind, Snapshots can not occupy additional too much memory, So when the > memory needs to be changed, the old memory page is needed to flush to the > file first. But flushing to file is too slower than memory, and when > flushing, the vcpu or VM is need to be paused until finished flushing, so > pause...resume...pause...resume., more and more slower. > > Is this idea feasible? Are there any other thoughts? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] Key Signing Party at KVM Forum 2013
I will be hosting a key signing party at this year's KVM Forum. http://wiki.qemu.org/KeySigningParty2013 Starting for the 1.7 release (begins in December), I will only accepted signed pull requests so please try to attend this event or make alternative arrangements to have someone sign your key who will attend the event. I will also be attending LinuxCon/CloudOpen/Plumbers North America if anyone wants to have another key signing party at that event and cannot attend KVM Forum. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VU#976534 - How to submit security bugs?
"CERT(R) Coordination Center" writes: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Greetings, > My name is Adam Rauf and I work for the CERT Coordination Center. We > have a report that may affect KVM/QEMU. How can we securely send it over to > you? Thanks so much! For QEMU bugs, please file a bug in Launchpad and mark it as a security bug. That will appropriately limit visibility. http://launchpad.net/qemu If you want to contact me directly, my public key is: http://www.codemonkey.ws/files/aliguori.pub You can verify that this key is what is used to sign QEMU releases at: http://wiki.qemu.org/Download Regards, Anthony Liguori > > Adam Rauf > Software Engineering Institute > CERT Vulnerability Analysis Team > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.5 (GNU/Linux) > > iQEVAwUBUe2DstXCAanP4MNyAQI8nwf/eTb1Qox5lmgMHifDKRjj69E37FW+o5Jp > KMIP6+IgKdWQizPctXk2Gae50a+ioaXgkCGZZ7SwNJ9iE/AX2I32QvX6pZrDCBGw > l5Ht6UiwOLUTP3sKWO9AIYcgTDABzyNE2+bCGvDz8aqwLB8NNVqQ50f46TrQNlmB > oiG+XzskRG0BAxKTwWc8f4v+1hdqMtp811I7XmxXkAdtlmTWPHZfPiFs0dS++Puh > T0uLuC4nDo83hP6Yv8seMZKZZApFGfR+q4qKx7f6riNsa5v1zGgW2if++u+zRKvg > DvjLxjtRfE9JGmCZMBcFmRJ5y4Wx/m/2wtj2+a7D/D2Hd9L5LRB0lA== > =npI1 > -END PGP SIGNATURE- > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Forum 2013 Call for Participation - Extended to August 4th
We have received numerous requests to extend the CFP deadline and so we are happy to announce that the CFP deadline has been moved by two weeks to August 4th. = KVM Forum 2013: Call For Participation October 21-23, 2013 - Edinburgh International Conference Centre - Edinburgh, UK (All submissions must be received before midnight July 21, 2013) = KVM is an industry leading open source hypervisor that provides an ideal platform for datacenter virtualization, virtual desktop infrastructure, and cloud computing. Once again, it's time to bring together the community of developers and users that define the KVM ecosystem for our annual technical conference. We will discuss the current state of affairs and plan for the future of KVM, its surrounding infrastructure, and management tools. The oVirt Workshop will run in parallel with the KVM Forum again, bringing in a community focused on enterprise datacenter virtualization management built on KVM. For topics which overlap we will have shared sessions. So mark your calendar and join us in advancing KVM. http://events.linuxfoundation.org/events/kvm-forum/ Once again we are colocated with The Linux Foundation's LinuxCon Europe. KVM Forum attendees will be able to attend oVirt Workshop sessions and are eligible to attend LinuxCon Europe for a discounted rate. http://events.linuxfoundation.org/events/kvm-forum/register We invite you to lead part of the discussion by submitting a speaking proposal for KVM Forum 2013. http://events.linuxfoundation.org/cfp Suggested topics: KVM/Kernel - Scaling and performance - Nested virtualization - I/O improvements - VFIO, device assignment, SR-IOV - Driver domains - Time keeping - Resource management (cpu, memory, i/o) - Memory management (page sharing, swapping, huge pages, etc) - Network virtualization - Security - Architecture ports QEMU - Device model improvements - New devices and chipsets - Scaling and performance - Desktop virtualization - Spice - Increasing robustness and hardening - Security model - Management interfaces - QMP protocol and implementation - Image formats - Firmware (SeaBIOS, OVMF, UEFI, etc) - Live migration - Live snapshots and merging - Fault tolerance, high availability, continuous backup - Real-time guest support Virtio - Speeding up existing devices - Alternatives - Virtio on non-Linux or non-virtualized Management infrastructure - oVirt (shared track w/ oVirt Workshop) - Libvirt - KVM autotest - OpenStack - Network virtualization management - Enterprise storage management Cloud computing - Scalable storage - Virtual networking - Security - Provisioning SUBMISSION REQUIREMENTS Abstracts due: July 21, 2013 Notification: August 1, 2013 Please submit a short abstract (~150 words) describing your presentation proposal. In your submission please note how long your talk will take. Slots vary in length up to 45 minutes. Also include in your proposal the proposal type -- one of: - technical talk - end-user talk - birds of a feather (BOF) session Submit your proposal here: http://events.linuxfoundation.org/cfp You will receive a notification whether or not your presentation proposal was accepted by Aug 1st. END-USER COLLABORATION One of the big challenges as developers is to know what, where and how people actually use our software. We will reserve a few slots for end users talking about their deployment challenges and achievements. If you are using KVM in production you are encouraged submit a speaking proposal. Simply mark it as an end-user collaboration proposal. As an end user, this is a unique opportunity to get your input to developers. BOF SESSION We will reserve some slots in the evening after the main conference tracks, for birds of a feather (BOF) sessions. These sessions will be less formal than presentation tracks and targetted for people who would like to discuss specific issues with other developers and/or users. If you are interested in getting developers and/or uses together to discuss a specific problem, please submit a BOF proposal. HOTEL / TRAVEL The KVM Forum 2013 will be held in Edinburgh, UK at the Edinburgh International Conference Centre. http://events.linuxfoundation.org/events/kvm-forum/hotel Thank you for your interest in KVM. We're looking forward to your submissions and seeing you at the KVM Forum 2013 in October! Thanks, -your KVM Forum 2013 Program Committee Please contact us with any questions or comments. kvm-forum-2013...@redhat.com -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
"Michael S. Tsirkin" writes: > On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote: >> Juan is not available now, and Anthony asked for >> agenda to be sent early. >> So here comes: >> >> Agenda for the meeting Tue, June 11: >> >> - Generating acpi tables, redux > > Not so much notes as a quick summary of the call: > > There are the following reasons to generate ACPI tables in QEMU: > > - sharing code with e.g. ovmf > Anthony thinks this is not a valid argument > > - so we can make tables more dynamic and move away from iasl > Anthony thinks this is not a valid reason too, > since qemu and seabios have access to same info > MST noted several info not accessible to bios. > Anthony said they can be added, e.g. by exposing > QOM to the bios. > > - even though most tables are static, hardcoded > they are likely to change over time > Anthony sees this as justified > > To summarize, there's a concensus now that generating ACPI > tables in QEMU is a good idea. I would say best worst idea ;-) I am deeply concerned about the complexity it introduces but I don't see many other options. > > Two issues that need to be addressed: > - original patches break cross-version migration. Need to fix that. > > - Anthony requested that patchset is merged together with > some new feature. I'm not sure the reasoning is clear: > current a version intentionally generates tables > that are bug for bug compatible with seabios, > to simplify testing. I expect that there will be additional issues that need to be worked out and want to see a feature that actually uses the infrastructure before we add it. > It seems clear we have users for this such as > hotplug of devices behind pci bridges, so > why keep the infrastructure out of tree? It's hard to evaluate the infrastructure without a user. > Looking for something additional, smaller as the hotplug patch > is a bit big, so might delay merging. > > > Going forward - would we want to move > smbios as well? Everyone seems to think it's a > good idea. Yes, independent of ACPI, I think QEMU should be generating the SMBIOS tables. Regards, Anthony Liguori > -- > MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Gleb Natapov writes: > On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: >> "H. Peter Anvin" writes: >> >> > On 06/05/2013 03:08 PM, Anthony Liguori wrote: >> >>> >> >>> Definitely an option. However, we want to be able to boot from native >> >>> devices, too, so having an I/O BAR (which would not be used by the OS >> >>> driver) should still at the very least be an option. >> >> >> >> What makes it so difficult to work with an MMIO bar for PCI-e? >> >> >> >> With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight >> >> forward. Is there something special about PCI-e here? >> >> >> > >> > It's not tracking allocation. It is that accessing memory above 1 MiB >> > is incredibly painful in the BIOS environment, which basically means >> > MMIO is inaccessible. >> >> Oh, you mean in real mode. >> >> SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. >> There are loads of ASSERT32FLAT()s in the code to make sure of this. >> > Well, not exactly. Initialization is done in 32bit, but disk > reads/writes are done in 16bit mode since it should work from int13 > interrupt handler. The only way I know to access MMIO bars from 16 bit > is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. SeaBIOS is polling for completion anyway. Regards, Anthony Liguori > > -- > Gleb. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Hi Rusty, Rusty Russell writes: > Anthony Liguori writes: >> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give >>it a new device/vendor ID. Continue to use virtio-pci for existing >>devices potentially adding virtio-{net,blk,...}-pcie variants for >>people that care to use them. > > Now you have a different compatibility problem; how do you know the > guest supports the new virtio-pcie net? We don't care. We would still use virtio-pci for existing devices. Only new devices would use virtio-pcie. > If you put a virtio-pci card behind a PCI-e bridge today, it's not > compliant, but AFAICT it will Just Work. (Modulo the 16-dev limit). I believe you can put it in legacy mode and then there isn't the 16-dev limit. I believe the only advantage of putting it in native mode is that then you can do native hotplug (as opposed to ACPI hotplug). So sticking with virtio-pci seems reasonable to me. > I've been assuming we'd avoid a "flag day" change; that devices would > look like existing virtio-pci with capabilities indicating the new > config layout. I don't think that's feasible. Maybe 5 or 10 years from now, we switch the default adapter to virtio-pcie. >> I think 4 is the best path forward. It's better for users (guests >> continue to work as they always have). There's less confusion about >> enabling PCI-e support--you must ask for the virtio-pcie variant and you >> must have a virtio-pcie driver. It's easy to explain. > > Removing both forward and backward compatibility is easy to explain, but > I think it'll be harder to deploy. This is your area though, so perhaps > I'm wrong. My concern is that it's not real backwards compatibility. >> It also maps to what regular hardware does. I highly doubt that there >> are any real PCI cards that made the shift from PCI to PCI-e without >> bumping at least a revision ID. > > Noone expected the new cards to Just Work with old OSes: a new machine > meant a new OS and new drivers. Hardware vendors like that. Yup. > Since virtualization often involves legacy, our priorities might be > different. So realistically, I think if we introduce virtio-pcie with a different vendor ID, it will be adopted fairly quickly. The drivers will show up in distros quickly and get backported. New devices can be limited to supporting virtio-pcie and we'll certainly provide a way to use old devices with virtio-pcie too. But for practical reasons, I think we have to continue using virtio-pci by default. Regards, Anthony Liguori > > Cheers, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"H. Peter Anvin" writes: > On 06/05/2013 03:08 PM, Anthony Liguori wrote: >>> >>> Definitely an option. However, we want to be able to boot from native >>> devices, too, so having an I/O BAR (which would not be used by the OS >>> driver) should still at the very least be an option. >> >> What makes it so difficult to work with an MMIO bar for PCI-e? >> >> With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight >> forward. Is there something special about PCI-e here? >> > > It's not tracking allocation. It is that accessing memory above 1 MiB > is incredibly painful in the BIOS environment, which basically means > MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Regards, Anthony Liguori > > -hpa > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Benjamin Herrenschmidt writes: > On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote: > >> A smart BIOS can also use MMIO to program virtio. > > Indeed :-) > > I see no reason why not providing both access path though. Have the PIO > BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the > MMIO window which I'd be happy to favor on power. > > We could even put somewhere in there a feature bit set by qemu to > indicate whether it thinks PIO or MMIO is faster on a given platform if > you really think that's worth it (I don't). That's okay, but what I'm most concerned about is compatibility. A virtio PCI device that's a "native endpoint" needs to have a different device ID than one that is a "legacy endpoint". The current drivers have no hope of working (well) with virtio PCI devices exposed as native endpoints. I don't care if the native PCI endpoint also has a PIO bar. But it seems silly (and confusing) to me to make that layout be the "legacy" layout verses a straight mirror of the new layout if we're already changing the device ID. In addition, it doesn't seem at all necessary to have an MMIO bar to the legacy device. If the reason you want MMIO is to avoid using PIO, then you break existing drivers because they assume PIO. If you are breaking existing drivers then you should change the device ID. If strictly speaking it's just that MMIO is a bit faster, I'm not sure that complexity is worth it without seeing performance numbers first. Regards, Anthony Liguori > > Cheers, > Ben. > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"H. Peter Anvin" writes: > On 06/05/2013 02:50 PM, Anthony Liguori wrote: >> "H. Peter Anvin" writes: >> >>> On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: >>>> >>>> Spec says IO and memory can be enabled/disabled, separately. >>>> PCI Express spec says devices should work without IO. >>>> >>> >>> For "native endpoints". Currently virtio would be a "legacy endpoint" >>> which is quite correct -- it is compatible with a legacy interface. >> >> Do legacy endpoints also use 4k for BARs? > > There are no 4K BARs. In fact, I/O BARs are restricted by spec (there > is no technical enforcement, however) to 256 bytes. > > The 4K come from the upstream bridge windows, which are only 4K granular > (historic stuff from when bridges were assumed rare.) However, there > can be multiple devices, functions, and BARs inside that window. Got it. > > The issue with PCIe is that each PCIe port is a bridge, so in reality > there is only one real device per bus number. > >> If not, can't we use a new device id for native endpoints and call it a >> day? Legacy endpoints would continue using the existing BAR layout. > > Definitely an option. However, we want to be able to boot from native > devices, too, so having an I/O BAR (which would not be used by the OS > driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? Regards, Anthony Liguori > > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> Can you explain? I thought the whole trick with separating out the >> virtqueue notification register was to regain the performance? > > Yes but this trick only works well with NPT (it's still a bit > slower than PIO but not so drastically). > Without NPT you still need a page walk so it will be slow. Do you mean NPT/EPT? If your concern is shadow paging, then I think you're concerned about hardware that is so slow to start with that it's not worth considering. >> >> It also maps to what regular hardware does. I highly doubt that there >> >> are any real PCI cards that made the shift from PCI to PCI-e without >> >> bumping at least a revision ID. >> > >> > Only because the chance it's 100% compatible on the software level is 0. >> > It always has some hardware specific quirks. >> > No such excuse here. >> > >> >> It also means we don't need to play games about sometimes enabling IO >> >> bars and sometimes not. >> > >> > This last paragraph is wrong, it ignores the issues 3) to 5) >> > I added above. >> > >> > If you do take them into account: >> >- there are reasons to add MMIO BAR to PCI, >> > even without PCI express >> >> So far, the only reason you've provided is "it doesn't work on some >> architectures." Which architectures? > > PowerPC wants this. Existing PowerPC remaps PIO to MMAP so it works fine today. Future platforms may not do this but future platforms can use a different device. They certainly won't be able to use the existing drivers anyway. Ben, am I wrong here? >> >- we won't be able to drop IO BAR from virtio >> >> An IO BAR is useless if it means we can't have more than 12 devices. > > > It's not useless. A smart BIOS can enable devices one by one as > it tries to boot from them. A smart BIOS can also use MMIO to program virtio. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"H. Peter Anvin" writes: > On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote: >> >> Spec says IO and memory can be enabled/disabled, separately. >> PCI Express spec says devices should work without IO. >> > > For "native endpoints". Currently virtio would be a "legacy endpoint" > which is quite correct -- it is compatible with a legacy interface. Do legacy endpoints also use 4k for BARs? If not, can't we use a new device id for native endpoints and call it a day? Legacy endpoints would continue using the existing BAR layout. Regards, Anthony Liguori > > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote: >> On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: >> > "Michael S. Tsirkin" writes: >> > >> > > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: >> > >> Look, it's very simple. >> > > We only need to do it if we do a change that breaks guests. >> > > >> > > Please find a guest that is broken by the patches. You won't find any. >> > >> > I think the problem in this whole discussion is that we're talking past >> > each other. >> > >> > Here is my understanding: >> > >> > 1) PCI-e says that you must be able to disable IO bars and still have a >> > functioning device. >> > >> > 2) It says (1) because you must size IO bars to 4096 which means that >> > practically speaking, once you enable a dozen or so PIO bars, you run >> > out of PIO space (16 * 4k == 64k and not all that space can be used). >> >> >> Let me add 3 other issues which I mentioned and you seem to miss: >> >> 3) architectures which don't have fast access to IO ports, exist >>virtio does not work there ATM >> >> 4) setups with many PCI bridges exist and have the same issue >>as PCI express. virtio does not work there ATM >> >> 5) On x86, even with nested page tables, firmware only decodes >>the page address on an invalid PTE, not the data. You need to >>emulate the guest to get at the data. Without >>nested page tables, we have to do page table walk and emulate >>to get both address and data. Since this is how MMIO >>is implemented in kvm on x86, MMIO is much slower than PIO >>(with nested page tables by a factor of >2, did not test without). > > Oh I forgot: > > 6) access to MMIO BARs is painful in the BIOS environment >so BIOS would typically need to enable IO for the boot device. But if you want to boot from the 16th device, the BIOS needs to solve this problem anyway. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: >> >> Look, it's very simple. >> > We only need to do it if we do a change that breaks guests. >> > >> > Please find a guest that is broken by the patches. You won't find any. >> >> I think the problem in this whole discussion is that we're talking past >> each other. >> >> Here is my understanding: >> >> 1) PCI-e says that you must be able to disable IO bars and still have a >> functioning device. >> >> 2) It says (1) because you must size IO bars to 4096 which means that >> practically speaking, once you enable a dozen or so PIO bars, you run >> out of PIO space (16 * 4k == 64k and not all that space can be used). > > > Let me add 3 other issues which I mentioned and you seem to miss: > > 3) architectures which don't have fast access to IO ports, exist >virtio does not work there ATM Which architectures have PCI but no IO ports? > 4) setups with many PCI bridges exist and have the same issue >as PCI express. virtio does not work there ATM This is not virtio specific. This is true for all devices that use IO. > 5) On x86, even with nested page tables, firmware only decodes >the page address on an invalid PTE, not the data. You need to >emulate the guest to get at the data. Without >nested page tables, we have to do page table walk and emulate >to get both address and data. Since this is how MMIO >is implemented in kvm on x86, MMIO is much slower than PIO >(with nested page tables by a factor of >2, did not test without). Am well aware of this, this is why we use PIO. I fully agree with you that when we do MMIO, we should switch the notification mechanism to avoid encoding anything meaningful as data. >> virtio-pci uses a IO bars exclusively today. Existing guest drivers >> assume that there is an IO bar that contains the virtio-pci registers. >> So let's consider the following scenarios: >> >> QEMU of today: >> >> 1) qemu -drive file=ubuntu-13.04.img,if=virtio >> >> This works today. Does adding an MMIO bar at BAR1 break this? >> Certainly not if the device is behind a PCI bus... >> >> But are we going to put devices behind a PCI-e bus by default? Are we >> going to ask the user to choose whether devices are put behind a legacy >> bus or the express bus? >> >> What happens if we put the device behind a PCI-e bus by default? Well, >> it can still work. That is, until we do something like this: >> >> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng >> -device virtio-balloon.. >> >> Such that we have more than a dozen or so devices. This works >> perfectly fine today. It works fine because we've designed virtio to >> make sure it works fine. Quoting the spec: >> >> "Configuration space is generally used for rarely-changing or >> initialization-time parameters. But it is a limited resource, so it >> might be better to use a virtqueue to update configuration information >> (the network device does this for filtering, otherwise the table in the >> config space could potentially be very large)." >> >> In fact, we can have 100s of PCI devices today without running out of IO >> space because we're so careful about this. >> >> So if we switch to using PCI-e by default *and* we keep virtio-pci >> without modifying the device IDs, then very frequently we are going to >> break existing guests because the drivers they already have no longer >> work. >> >> A few virtio-serial channels, a few block devices, a couple of network >> adapters, the balloon and RNG driver, and we hit the IO space limit >> pretty damn quickly so this is not a contrived scenario at all. I would >> expect that we frequently run into this if we don't address this problem. >> >> So we have a few options: >> 1) Punt all of this complexity to libvirt et al and watch people make >>the wrong decisions about when to use PCI-e. This will become yet >>another example of KVM being too hard to configure. >> >> 2) Enable PCI-e by default and just force people to upgrade their >>drivers. >> >> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci >> >> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), > > We can't do this - it wil
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote: >> Look, it's very simple. > We only need to do it if we do a change that breaks guests. > > Please find a guest that is broken by the patches. You won't find any. I think the problem in this whole discussion is that we're talking past each other. Here is my understanding: 1) PCI-e says that you must be able to disable IO bars and still have a functioning device. 2) It says (1) because you must size IO bars to 4096 which means that practically speaking, once you enable a dozen or so PIO bars, you run out of PIO space (16 * 4k == 64k and not all that space can be used). virtio-pci uses a IO bars exclusively today. Existing guest drivers assume that there is an IO bar that contains the virtio-pci registers. So let's consider the following scenarios: QEMU of today: 1) qemu -drive file=ubuntu-13.04.img,if=virtio This works today. Does adding an MMIO bar at BAR1 break this? Certainly not if the device is behind a PCI bus... But are we going to put devices behind a PCI-e bus by default? Are we going to ask the user to choose whether devices are put behind a legacy bus or the express bus? What happens if we put the device behind a PCI-e bus by default? Well, it can still work. That is, until we do something like this: 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng -device virtio-balloon.. Such that we have more than a dozen or so devices. This works perfectly fine today. It works fine because we've designed virtio to make sure it works fine. Quoting the spec: "Configuration space is generally used for rarely-changing or initialization-time parameters. But it is a limited resource, so it might be better to use a virtqueue to update configuration information (the network device does this for filtering, otherwise the table in the config space could potentially be very large)." In fact, we can have 100s of PCI devices today without running out of IO space because we're so careful about this. So if we switch to using PCI-e by default *and* we keep virtio-pci without modifying the device IDs, then very frequently we are going to break existing guests because the drivers they already have no longer work. A few virtio-serial channels, a few block devices, a couple of network adapters, the balloon and RNG driver, and we hit the IO space limit pretty damn quickly so this is not a contrived scenario at all. I would expect that we frequently run into this if we don't address this problem. So we have a few options: 1) Punt all of this complexity to libvirt et al and watch people make the wrong decisions about when to use PCI-e. This will become yet another example of KVM being too hard to configure. 2) Enable PCI-e by default and just force people to upgrade their drivers. 3) Don't use PCI-e by default but still add BAR1 to virtio-pci 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give it a new device/vendor ID. Continue to use virtio-pci for existing devices potentially adding virtio-{net,blk,...}-pcie variants for people that care to use them. I think 1 == 2 == 3 and I view 2 as an ABI breaker. libvirt does like policy so they're going to make a simple decision and always use the same bus by default. I suspect if we made PCI the default, they might just always set the PCI-e flag just because. There are hundreds of thousands if not millions of guests with existing virtio-pci drivers. Forcing them to upgrade better have an extremely good justification. I think 4 is the best path forward. It's better for users (guests continue to work as they always have). There's less confusion about enabling PCI-e support--you must ask for the virtio-pcie variant and you must have a virtio-pcie driver. It's easy to explain. It also maps to what regular hardware does. I highly doubt that there are any real PCI cards that made the shift from PCI to PCI-e without bumping at least a revision ID. It also means we don't need to play games about sometimes enabling IO bars and sometimes not. Regards, Anthony Liguori > > > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: >> >> "Michael S. Tsirkin" writes: >> >> >> >> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: >> >> > You mean make BAR0 an MMIO BAR? >> >> > Yes, it would break current windows guests. >> >> > Further, as long as we use same address to notify all queues, >> >> > we would also need to decode the instruction on x86 and that's >> >> > measureably slower than PIO. >> >> > We could go back to discussing hypercall use for notifications, >> >> > but that has its own set of issues... >> >> >> >> So... does "violating the PCI-e" spec really matter? Is it preventing >> >> any guest from working properly? >> > >> > Yes, absolutely, this wording in spec is not there without reason. >> > >> > Existing guests allocate io space for PCI express ports in >> > multiples on 4K. >> > >> > Since each express device is behind such a port, this means >> > at most 15 such devices can use IO ports in a system. >> > >> > That's why to make a pci express virtio device, >> > we must allow MMIO and/or some other communication >> > mechanism as the spec requires. >> >> This is precisely why this is an ABI breaker. >> >> If you disable IO bars in the BIOS, than the interface that the OS sees >> will *not have an IO bar*. >> >> This *breaks existing guests*. >> Any time the programming interfaces changes on a PCI device, the >> revision ID and/or device ID must change. The spec is very clear about >> this. >> >> We cannot disable the IO BAR without changing revision ID/device ID. >> > > But it's a bios/PC issue. It's not a device issue. > > Anyway, let's put express aside. > > It's easy to create non-working setups with pci, today: > > - create 16 pci bridges > - put one virtio device behind each > > boom > > Try it. > > I want to fix that. > > >> > That's on x86. >> > >> > Besides x86, there are achitectures where IO is unavailable or very slow. >> > >> >> I don't think we should rush an ABI breakage if the only benefit is >> >> claiming spec compliance. >> >> >> >> Regards, >> >> >> >> Anthony Liguori >> > >> > Why do you bring this up? No one advocates any ABI breakage, >> > I only suggest extensions. >> >> It's an ABI breakage. You're claiming that the guests you tested >> handle the breakage reasonably but it is unquestionably an ABI breakage. >> >> Regards, >> >> Anthony Liguori > > Adding BAR is not an ABI breakage, do we agree on that? > > Disabling IO would be but I am not proposing disabling IO. > > Guests might disable IO. Look, it's very simple. If the failure in the guest is that BAR0 mapping fails because the device is enabled but the BAR is disabled, then you've broken the ABI. And what's worse is that this isn't for an obscure scenario (like having 15 PCI bridges) but for something that would become the standard scenario (using a PCI-e bus). We need to either bump the revision ID or the device ID if we do this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: >> > You mean make BAR0 an MMIO BAR? >> > Yes, it would break current windows guests. >> > Further, as long as we use same address to notify all queues, >> > we would also need to decode the instruction on x86 and that's >> > measureably slower than PIO. >> > We could go back to discussing hypercall use for notifications, >> > but that has its own set of issues... >> >> So... does "violating the PCI-e" spec really matter? Is it preventing >> any guest from working properly? > > Yes, absolutely, this wording in spec is not there without reason. > > Existing guests allocate io space for PCI express ports in > multiples on 4K. > > Since each express device is behind such a port, this means > at most 15 such devices can use IO ports in a system. > > That's why to make a pci express virtio device, > we must allow MMIO and/or some other communication > mechanism as the spec requires. This is precisely why this is an ABI breaker. If you disable IO bars in the BIOS, than the interface that the OS sees will *not have an IO bar*. This *breaks existing guests*. Any time the programming interfaces changes on a PCI device, the revision ID and/or device ID must change. The spec is very clear about this. We cannot disable the IO BAR without changing revision ID/device ID. > That's on x86. > > Besides x86, there are achitectures where IO is unavailable or very slow. > >> I don't think we should rush an ABI breakage if the only benefit is >> claiming spec compliance. >> >> Regards, >> >> Anthony Liguori > > Why do you bring this up? No one advocates any ABI breakage, > I only suggest extensions. It's an ABI breakage. You're claiming that the guests you tested handle the breakage reasonably but it is unquestionably an ABI breakage. Regards, Anthony Liguori > > >> > >> > -- >> > MST >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: > You mean make BAR0 an MMIO BAR? > Yes, it would break current windows guests. > Further, as long as we use same address to notify all queues, > we would also need to decode the instruction on x86 and that's > measureably slower than PIO. > We could go back to discussing hypercall use for notifications, > but that has its own set of issues... So... does "violating the PCI-e" spec really matter? Is it preventing any guest from working properly? I don't think we should rush an ABI breakage if the only benefit is claiming spec compliance. Regards, Anthony Liguori > > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Jordan Justen writes: > On Fri, May 31, 2013 at 11:35 AM, Anthony Liguori > wrote: >> As I think more about it, I think forking edk2 is inevitable. We need a >> clean repo that doesn't include the proprietary binaries. I doubt >> upstream edk2 is willing to remove the binaries. > > No, probably not unless a BSD licensed alternative was available. :) > > But, in thinking about what might make sense for EDK II with git, one > option that should be considered is breaking the top-level 'packages' > into separate sub-modules. I had gone so far as to start pushing repos > as sub-modules. > > But, as the effort to convert EDK II to git has stalled (actually > never even thought about leaving the ground), I abandoned that > approach and went back to just mirroring one EDK II. > > I could fairly easily re-enable mirror the sub-set of packages needed > for OVMF. So, in that case, the FatBinPkg sub-module could easily be > dropped from a tree. > >> But this can be quite simple using a combination of git-svn and a >> rewriting script. We did exactly this to pull out the VGABios from >> Bochs and remove the binaries associated with it. It's 100% automated >> and can be kept in sync via a script on qemu.org. > > I would love to mirror the BaseTools as a sub-package without all the > silly windows binaries... What script did you guys use? We did this in git pre-history, now git has a fancy git-filter-branch command that makes it a breeze: http://git-scm.com/book/ch6-4.html Regards, Anthony Liguori > > -Jordan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Jordan Justen writes: > On Fri, May 31, 2013 at 7:38 AM, Anthony Liguori > wrote: >> In terms of creating a FAT module, the most likely source would seem to >> be the kernel code and since that's GPL, I don't think it's terribly >> avoidable to end up with a GPL'd uefi implementation. > > Why would OpenBSD not be a potential source? > > http://www.openbsd.org/cgi-bin/cvsweb/src/sys/msdosfs/ If someone is going to do it, that's fine. But if me, it's going to be a GPL base. Actually, enabling GPL contributions to OVMF is a major motivating factor for me in this whole discussion. Regards, Anthony Liguori > > We have a half-done ext2 fs from GSoC2011 that started with OpenBSD. > > https://github.com/the-ridikulus-rat/Tianocore_Ext2Pkg > >> If that's inevitable, then we're wasting effort by rewriting stuff under >> a BSD license. >> >> Regards, >> >> Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Paolo Bonzini writes: > Il 31/05/2013 19:06, Anthony Liguori ha scritto: >> David Woodhouse writes: >> >>> On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote: >>>> It's even more fundamental. OVMF as a whole (at least in it's usable >>>> form) is not Open Source. >>> >>> The FAT module is required to make EDK2 usable, and yes, that's not Open >>> Source. So in a sense you're right. >>> >>> But we're talking here about *replacing* the FAT module with something >>> that *is* open source. And the FAT module isn't a fundamental part of >>> EDK2; it's just an optional module that happens to be bundled with the >>> repository. >> >> So *if* we replace the FAT module *and* that replacement was GPL, would >> there be any objects to having more GPL modules for things like virtio, >> ACPI, etc? >> >> And would that be doable in the context of OVMF or would another project >> need to exist for this purpose? > > I don't think it would be doable in TianoCore. I think it would end up > either in distros, or in QEMU. As I think more about it, I think forking edk2 is inevitable. We need a clean repo that doesn't include the proprietary binaries. I doubt upstream edk2 is willing to remove the binaries. But this can be quite simple using a combination of git-svn and a rewriting script. We did exactly this to pull out the VGABios from Bochs and remove the binaries associated with it. It's 100% automated and can be kept in sync via a script on qemu.org. > A separate question is whether OVMF makes more sense as part of > TianoCore or rather as part of QEMU. I'm not sure if qemu.git is the right location, but we can certainly host an ovmf.git on qemu.git that embeds the scrubbed version of edk2.git. Of course, this would enable us to add GPL code (including a FAT module) to ovmf.git without any impact on upstream edk2. > With 75% of the free hypervisors > now reunited under the same source repository, the balance is > tilting... :-) Regards, Anthony Liguori > > Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Laszlo Ersek writes: > On 05/31/13 16:38, Anthony Liguori wrote: > >> It's either Open Source or it's not. It's currently not. > > I disagree with this binary representation of Open Source or Not. If it > weren't (mostly) Open Source, how could we fork (most of) it as you're > suggesting (from the soapbox :))? > >> I have a hard >> time sympathesizing with trying to work with a proprietary upstream. > > My experience has been positive. > > First of all, whether UEFI is a good thing or not is controversial. I > won't try to address that. > > However UEFI is here to stay, machines are being shipped with it, Linux > and other OSen try to support it. Developing (or running) an OS in > combination with a specific firmware is sometimes easier / more economic > in a virtual environment, hence there should be support for qemu + UEFI. > It is this mindset that I operate in. (Oh, I also forgot to mention that > this task has been assigned to me by my superiors as well :)) > > Jordan, the OvmfPkg maintainer is responsive and progressive in the true > FLOSS manner (*), which was a nice surprise for a project whose coding > standards for example are made 100% after Windows source code, and whose > mailing list is mostly subscribed to by proprietary vendors. Really when > it comes to OvmfPkg patches the process follows the "normal" FLOSS > development model. > > (*) Jordan, I hope this will prompt you to merge VirtioNetDxe v4 real > soon now :) (Removing seabios from the CC as we've moved far away from seabios as a topic) Just so no one gets the wrong idea, the OVMF team is now a victim of their own success. I had hoped that no one would do the work necessary to get us to the point where we had to seriously think about UEFI support but that's where we are now :-) > Thus far we've been talking copyright rather than patents, but there's > also this: > > http://en.wikipedia.org/wiki/FAT_filesystem#Challenge > http://en.wikipedia.org/wiki/FAT_filesystem#Patent_infringement_lawsuits > > It almost doesn't matter who prevails in such a lawsuit; the > *possibility* of such a lawsuit gives people cold feet. Blame the > USPTO. Just to say it once so I don't have to ever say it again. I'm not going to discuss anything relating to patents and FAT publicly. Everyone should consult with their respective lawyers on such issues. Copyright is straight forward. Patents are not. Regards, Anthony Liguori > > Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
David Woodhouse writes: > On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote: >> It's even more fundamental. OVMF as a whole (at least in it's usable >> form) is not Open Source. > > The FAT module is required to make EDK2 usable, and yes, that's not Open > Source. So in a sense you're right. > > But we're talking here about *replacing* the FAT module with something > that *is* open source. And the FAT module isn't a fundamental part of > EDK2; it's just an optional module that happens to be bundled with the > repository. So *if* we replace the FAT module *and* that replacement was GPL, would there be any objects to having more GPL modules for things like virtio, ACPI, etc? And would that be doable in the context of OVMF or would another project need to exist for this purpose? > So I think you're massively overstating the issue. OVMF/EDK2 *is* Open > Source, and replacing the FAT module really isn't that hard. > > We can only bury our heads in the sand and ship qemu with > non-EFI-capable firmware for so long... Which is why I think we need to solve the real problem here. > I *know* there's more work to be done. We have SeaBIOS-as-CSM, Jordan > has mostly sorted out the NV variable storage, and now the FAT issue is > coming up to the top of the pile. But we aren't far from the point where > we can realistically say that we want the Open Source OVMF to be the > default firmware shipped with qemu. Yes, that's why I'm raising this now. We all knew that we'd have to talk about this eventually. Regards, Anthony Liguori > > -- > dwmw2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
David Woodhouse writes: > On Fri, 2013-05-31 at 08:04 -0500, Anthony Liguori wrote: >> >> >> >> Fork OVMF, drop the fat module, and just add GPL code. It's an easily >> solvable problem. > > Heh. Actually it doesn't need to be a fork. It's modular, and the FAT > driver is just a single module. Which is actually included in *binary* > form in the EDK2 repository, I believe, and its source code is > elsewhere. > > We could happily make a GPL¹ or LGPL implementation of a FAT module and > build our OVMF with that instead, and we wouldn't need to fork OVMF at > all. So can't we have GPL virtio modules too? I don't think there's any problem there except for the FAT module. I would propose more of a virtual fork. It could consist of a git repo with the GPL modules + a submodule for edk2. Ideally, there would be no need to actually fork edk2. My assumption is that edk2 won't take GPL code. But does ovmf really need to live in the edk2 tree? If we're going to get serious about supporting OVMF, it we need something that isn't proprietary. > -- > dwmw2 > > ¹ If it's GPL, of course, then we mustn't include any *other* binary > blobs in our OVMF build. But the whole point in this conversation is > that we don't *want* to do that. So that's fine. It's even more fundamental. OVMF as a whole (at least in it's usable form) is not Open Source. Without even tackling the issue of GPL code sharing, that is a fundamental problem that needs to be solved if we're going to serious about making changes to QEMU to support it. I think solving the general problem will also enable GPL code sharing though. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Laszlo Ersek writes: > On 05/31/13 15:04, Anthony Liguori wrote: >> Laszlo Ersek writes: >> >>> On 05/31/13 09:09, Jordan Justen wrote: >>> >>> Due to licensing differences I can't just port code from SeaBIOS to >>> OVMF >> >> > > :) > >> Fork OVMF, drop the fat module, and just add GPL code. It's an easily >> solvable problem. > > It's not optimal for the "upstream first" principle; OVMF is not Open Source so "upstream first" doesn't apply. At least, the FAT module is not Open Source. Bullet 8 from the Open Source Definition[1] "8. License Must Not Be Specific to a Product The rights attached to the program must not depend on the program's being part of a particular software distribution. If the program is extracted from that distribution and used or distributed within the terms of the program's license, all parties to whom the program is redistributed should have the same rights as those that are granted in conjunction with the original software distribution." License from OVMF FAT module[2]: "Additional terms: In addition to the forgoing, redistribution and use of the code is conditioned upon the FAT 32 File System Driver and all derivative works thereof being used for and designed only to read and/or write to a file system that is directly managed by: Intel’s Extensible Firmware Initiative (EFI) Specification v. 1.0 and later and/or the Unified Extensible Firmware Interface (UEFI) Forum’s UEFI Specifications v.2.0 and later (together the “UEFI Specifications”); only as necessary to emulate an implementation of the UEFI Specifications; and to create firmware, applications, utilities and/or drivers." [1] http://opensource.org/osd-annotated [2] http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=Edk2-fat-driver AFAIK, for the systems that we'd actually want to use OVMF for, a FAT module is a hard requirement. > we'd have to > backport upstream edk2 patches forever (there's a whole lot of edk2 > modules outside of direct OvmfPkg that get built into OVMF.fd -- OvmfPkg > "only" customizes / cherry-picks the full edk2 tree for virtual > machines), or to periodically rebase an ever-increasing set of patches. > > Independently, we need *some* FAT driver (otherwise you can't even boot > most installer media), which is where the already discussed worries lie. > Whatever solves this aspect is independent of forking all of edk2. It's either Open Source or it's not. It's currently not. I have a hard time sympathesizing with trying to work with a proprietary upstream. >> Rewriting BSD implementations of everything is silly. Every other >> vendor that uses TianoCore has a proprietary fork. > > Correct, but they (presumably) keep rebasing their ever accumulating > stuff at least on the periodically refreshed "stable edk2 subset" > (UDK2010, which BTW doesn't include OvmfPkg). This must be horrible for > them, but in exchange they get to remain proprietary (which may benefit > them commercially). > >> Maintaining a GPL >> fork seems just as reasonable. > > Perhaps; diverging from "upstream first" would hurt for certain. Well I'm suggesting creating a real upstream (that is actually Open Source). Then I'm all for upstream first. In terms of creating a FAT module, the most likely source would seem to be the kernel code and since that's GPL, I don't think it's terribly avoidable to end up with a GPL'd uefi implementation. If that's inevitable, then we're wasting effort by rewriting stuff under a BSD license. Regards, Anthony Liguori > >> > > Thanks for the suggestion :) > Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Laszlo Ersek writes: > On 05/31/13 09:09, Jordan Justen wrote: > > Due to licensing differences I can't just port code from SeaBIOS to > OVMF Fork OVMF, drop the fat module, and just add GPL code. It's an easily solvable problem. Rewriting BSD implementations of everything is silly. Every other vendor that uses TianoCore has a proprietary fork. Maintaining a GPL fork seems just as reasonable. Regards, Anthony Liguori > (and I never have without explicit permission), so it's been a lot of > back and forth with acpidump / iasl -d in guests (massage OVMF, boot > guest, check guest dmesg / lspci, dump tables, compare, repeat), brain > picking colleagues, the ACPI and PIIX specs and so on. I have a page on > the RH intranet dedicated to this. When something around these parts is > being changed (or looks like it could be changed) in SeaBIOS, or between > qemu and SeaBIOS, I always must be alert and consider reimplementing it > in, or porting it with permission to, OVMF. (Most recent example: > pvpanic device -- currently only in SeaBIOS.) > > It worries me that if I slack off, or am busy with something else, or > simply don't notice, then the gap will widen again. I appreciate > learning a bunch about ACPI, and don't mind the days of work that went > into some of my simple-looking ACPI patches for OVMF, but had the tables > come from a common (programmatic) source, none of this would have been > an issue, and I wouldn't have felt even occasionally that ACPI patches > for OVMF were both duplicate work *and* futile (considering how much > ahead SeaBIOS was). > > I don't mind reimplementing stuff, or porting it with permission, going > forward, but the sophisticated parts in SeaBIOS are a hard nut. For > example I'll never be able to auto-extract offsets from generated AML > and patch the AML using those offsets; the edk2 build tools (a project > separate from edk2) don't support this, and it takes several months to > get a thing as simple as gcc-47 build flags into edk2-buildtools. > > Instead I have to write template ASL, compile it to AML, hexdump the > result, verify it against the AML grammar in the ACPI spec (offsets > aren't obvious, BytePrefix and friends are a joy), define & initialize a > packed struct or array in OVMF, and patch the template AML using fixed > field names or array subscripts. Workable, but dog slow. If the ACPI > payload came from up above, we might be as well provided with a list of > (canonical name, offset, size) triplets, and could perhaps blindly patch > the contents. (Not unlike Michael's linker code for connecting tables > into a hierarchy.) > > AFAIK most recently iasl got built-in support for offset extraction (and > in the process the current SeaBIOS build method was broken...), so that > part might get easier in the future. > > Oh well it's Friday, sorry about this rant! :) I'll happily do what I > can in the current status quo, but frequently, it won't amount to much. > > Thanks, > Laszlo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
Kevin O'Connor writes: > On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote: >> There were discussions on potentially introducing a middle component >> to generate the tables. Coreboot was raised as a possibility, and >> David thought it would be okay to use coreboot for both OVMF and >> SeaBIOS. The possibility was also raised of a "rom" that lives in the >> qemu repo, is run in the guest, and generates the tables (which is >> similar to the hvmloader approach that Xen uses). > > Given the objections to implementing ACPI directly in QEMU, one > possible way forward would be to split the current SeaBIOS rom into > two roms: "qvmloader" and "seabios". The "qvmloader" would do the > qemu specific platform init (pci init, smm init, mtrr init, bios > tables) and then load and run the regular seabios rom. With this > split, qvmloader could be committed into the QEMU repo and maintained > there. This would be analogous to Xen's hvmloader with the seabios > code used as a starting point to implement it. What about a small change to the SeaBIOS build system to allow ACPI table generation to be done via a "plugin". This could be as simple as moving acpi.c and *.dsl into the QEMU build tree and then having a way to point the SeaBIOS makefiles to our copy of it. Then the logic is maintained stays in firmware but the churn happens in the QEMU tree instead of the SeaBIOS tree. Regards, Anthony Liguori > > With both the hardware implementation and acpi descriptions for that > hardware in the same source code repository, it would be possible to > implement changes to both in a single patch series. The fwcfg entries > used to pass data between qemu and qvmloader could also be changed in > a single patch and thus those fwcfg entries would not need to be > considered a stable interface. The qvmloader code also wouldn't need > the 16bit handlers that seabios requires and thus wouldn't need the > full complexity of the seabios build. Finally, it's possible that > both ovmf and seabios could use a single qvmloader implementation. > > On the down side, reboots can be a bit goofy today in kvm, and that > would need to be settled before something like qvmloader could be > implemented. Also, it may be problematic to support passing of bios > tables from qvmloader to seabios for guests with only 1 meg of ram. > > Thoughts? > -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
"Michael S. Tsirkin" writes: > On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote: >> Stefan Hajnoczi writes: >> >> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell >> > wrote: >> >> Anthony Liguori writes: >> >>> Rusty Russell writes: >> >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >> >>>>> FWIW, I think what's more interesting is using vhost-net as a >> >>>>> networking >> >>>>> backend with virtio-net in QEMU being what's guest facing. >> >>>>> >> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a first >> >>>>> line of defense against a malicious guest while still getting the >> >>>>> performance advantages of vhost-net (zero-copy). >> >>>>> >> >>>> It would be an interesting idea if we didn't already have the vhost >> >>>> model where we don't need the userspace bounce. >> >>> >> >>> The model is very interesting for QEMU because then we can use vhost as >> >>> a backend for other types of network adapters (like vmxnet3 or even >> >>> e1000). >> >>> >> >>> It also helps for things like fault tolerance where we need to be able >> >>> to control packet flow within QEMU. >> >> >> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). >> >> >> >> Then I'm really confused as to what this would look like. A zero copy >> >> sendmsg? We should be able to implement that today. >> >> >> >> On the receive side, what can we do better than readv? If we need to >> >> return to userspace to tell the guest that we've got a new packet, we >> >> don't win on latency. We might reduce syscall overhead with a >> >> multi-dimensional readv to read multiple packets at once? >> > >> > Sounds like recvmmsg(2). >> >> Could we map this to mergable rx buffers though? >> >> Regards, >> >> Anthony Liguori > > Yes because we don't have to complete buffers in order. What I meant though was for GRO, we don't know how large the received packet is going to be. Mergable rx buffers lets us allocate a pool of data for all incoming packets instead of allocating max packet size * max packets. recvmmsg expects an array of msghdrs and I presume each needs to be given a fixed size. So this seems incompatible with mergable rx buffers. Regards, Anthony Liguori > >> > >> > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell writes: > Anthony Liguori writes: >> Forcing a guest driver change is a really big >> deal and I see no reason to do that unless there's a compelling reason >> to. >> >> So we're stuck with the 1.0 config layout for a very long time. > > We definitely must not force a guest change. The explicit aim of the > standard is that "legacy" and 1.0 be backward compatible. One > deliverable is a document detailing how this is done (effectively a > summary of changes between what we have and 1.0). If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori > > It's a delicate balancing act. My plan is to accompany any changes in > the standard with a qemu implementation, so we can see how painful those > changes are. And if there are performance implications, measure them. > > Cheers, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Stefan Hajnoczi writes: > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell wrote: >> Anthony Liguori writes: >>> Rusty Russell writes: >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >>>>> FWIW, I think what's more interesting is using vhost-net as a networking >>>>> backend with virtio-net in QEMU being what's guest facing. >>>>> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a first >>>>> line of defense against a malicious guest while still getting the >>>>> performance advantages of vhost-net (zero-copy). >>>>> >>>> It would be an interesting idea if we didn't already have the vhost >>>> model where we don't need the userspace bounce. >>> >>> The model is very interesting for QEMU because then we can use vhost as >>> a backend for other types of network adapters (like vmxnet3 or even >>> e1000). >>> >>> It also helps for things like fault tolerance where we need to be able >>> to control packet flow within QEMU. >> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). >> >> Then I'm really confused as to what this would look like. A zero copy >> sendmsg? We should be able to implement that today. >> >> On the receive side, what can we do better than readv? If we need to >> return to userspace to tell the guest that we've got a new packet, we >> don't win on latency. We might reduce syscall overhead with a >> multi-dimensional readv to read multiple packets at once? > > Sounds like recvmmsg(2). Could we map this to mergable rx buffers though? Regards, Anthony Liguori > > Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Rusty Russell writes: > Anthony Liguori writes: >> Rusty Russell writes: >>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >>>> FWIW, I think what's more interesting is using vhost-net as a networking >>>> backend with virtio-net in QEMU being what's guest facing. >>>> >>>> In theory, this gives you the best of both worlds: QEMU acts as a first >>>> line of defense against a malicious guest while still getting the >>>> performance advantages of vhost-net (zero-copy). >>>> >>> It would be an interesting idea if we didn't already have the vhost >>> model where we don't need the userspace bounce. >> >> The model is very interesting for QEMU because then we can use vhost as >> a backend for other types of network adapters (like vmxnet3 or even >> e1000). >> >> It also helps for things like fault tolerance where we need to be able >> to control packet flow within QEMU. > > (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts). > > Then I'm really confused as to what this would look like. A zero copy > sendmsg? We should be able to implement that today. The only trouble with sendmsg would be doing batch submission and asynchronous completion. A thread pool could certainly be used for this I guess. Regards, Anthony Liguori > On the receive side, what can we do better than readv? If we need to > return to userspace to tell the guest that we've got a new packet, we > don't win on latency. We might reduce syscall overhead with a > multi-dimensional readv to read multiple packets at once? > > Confused, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] KVM call agenda for 2013-05-28
Gerd Hoffmann writes: > On 05/29/13 01:53, Kevin O'Connor wrote: >> On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote: >>> Juan is not available now, and Anthony asked for >>> agenda to be sent early. >>> So here comes: >>> >>> Agenda for the meeting Tue, May 28: >>> >>> - Generating acpi tables >> >> I didn't see any meeting notes, but I thought it would be worthwhile >> to summarize the call. This is from memory so correct me if I got >> anything wrong. >> >> Anthony believes that the generation of ACPI tables is the task of the >> firmware. Reasons cited include security implications of running more >> code in qemu vs the guest context, > > I fail to see the security issues here. It's not like the apci table > generation code operates on untrusted input from the guest ... But possibly untrusted input from a malicious user. You can imagine something like a IaaS provider that let's a user input arbitrary values for memory, number of nics, etc. It's a stretch of an example, I agree, but the general principle I think is sound: we should push as much work as possible to the least privileged part of the stack. In this case, firmware has much less privileges than QEMU. >> complexities in running iasl on >> big-endian machines, > > We already have a bunch of prebuilt blobs in the qemu repo for simliar > reasons, we can do that with iasl output too. > >> possible complexity of having to regenerate >> tables on a vm reboot, > > Why tables should be regenerated at reboot? I remember hotplug being > mentioned in the call. Hmm? Which hotplugged component needs acpi > table updates to work properly? And what is the point of hotplugging if > you must reboot the guest anyway to get the acpi updates needed? > Details please. See my response to Michael. > Also mentioned in the call: "architectural reasons", which I understand > as "real hardware works that way". Correct. But qemu's virtual > hardware is configurable in more ways than real hardware, so we have > different needs. For example: pci slots can or can't be hotpluggable. > On real hardware this is fixed. IIRC this is one of the reasons why we > have to patch acpi tables. It's not really fixed. Hardware supports PCI expansion chassises. Multi-node NUMA systems also affect the ACPI tables. >> overall sloppiness of doing it in QEMU. > > /me gets the feeling that this is the *main* reason, given that the > other ones don't look very convincing to me. > >> Raised >> that QOM interface should be sufficient. > > Agree on this one. Ideally the acpi table generation code should be > able to gather all information it needs from the qom tree, so it can be > a standalone C file instead of being scattered over all qemu. Ack. So my basic argument is why not expose the QOM interfaces to firmware and move the generation code there? Seems like it would be more or less a copy/paste once we had a proper implementation in QEMU. >> There were discussions on potentially introducing a middle component >> to generate the tables. Coreboot was raised as a possibility, and >> David thought it would be okay to use coreboot for both OVMF and >> SeaBIOS. > > Certainly an option, but that is a long-term project. Out of curiousity, are there other benefits to using coreboot as a core firmware in QEMU? Is there a payload we would ever plausibly use besides OVMF and SeaBIOS? Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-28
"Michael S. Tsirkin" writes: > On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote: >> On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote: >> > Juan is not available now, and Anthony asked for >> > agenda to be sent early. >> > So here comes: >> > >> > Agenda for the meeting Tue, May 28: >> > >> > - Generating acpi tables >> >> I didn't see any meeting notes, but I thought it would be worthwhile >> to summarize the call. This is from memory so correct me if I got >> anything wrong. >> >> Anthony believes that the generation of ACPI tables is the task of the >> firmware. Reasons cited include security implications of running more >> code in qemu vs the guest context, complexities in running iasl on >> big-endian machines, possible complexity of having to regenerate >> tables on a vm reboot, overall sloppiness of doing it in QEMU. Raised >> that QOM interface should be sufficient. >> >> Kevin believes that the bios table code should be moved up into QEMU. >> Reasons cited include the churn rate in SeaBIOS for this QEMU feature >> (15-20% of all SeaBIOS commits since integrating with QEMU have been >> for bios tables; 20% of SeaBIOS commits in last year), complexity of >> trying to pass all the content needed to generate the tables (eg, >> device details, power tree, irq routing), complexity of scheduling >> changes across different repos and synchronizing their rollout, >> complexity of implemeting the code in both OVMF and SeaBIOS. Kevin >> wasn't aware of a requirement to regenerate acpi tables on a vm >> reboot. > > I think this last one is based on a misunderstanding: it's based > on assumption that we we change hardware by hotplug > we should regenerate the tables to match. > But there's no management that can take advantage of > this. > Two possible reasonable things we can tell management: > - hotplug for device XXX is not supported: restart qemu > to make guest use the device > - hotplug for device XXX is supported This introduces an assumption: that the device model never radically changes across resets. Why should this be true? Shouldn't we be allowed to increase the amount of memory the guest has across reboots? That's equivalent to adding another DIMM after power off. Not generating tables on reset does limit what we can do in a pretty fundamental way. Even if you can argue it in the short term, I don't think it's viable in the long term. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> > I'm guessing any compiler that decides to waste memory in this way >> > will quickly get dropped by users and then we won't worry >> > about building QEMU with it. >> >> There are other responses in the thread here and I don't really care to >> bikeshed on this issue. > > Great. Let's make the bikeshed blue then? It's fun to argue about stuff like this and I certainly have an opinion, but I honestly don't care all that much about the offsetof thing. However... > >> >> Well, given that virtio is widely deployed today, I would think the 1.0 >> >> standard should strictly reflect what's deployed today, no? >> >> Any new config layout would be 2.0 material, right? >> > >> > Not as it's currently planned. Devices can choose >> > to support a legacy layout in addition to the new one, >> > and if you look at the patch you will see that that >> > is exactly what it does. >> >> Adding a new BAR most certainly requires bumping the revision ID or >> changing the device ID, no? > > No, why would it? If we change the programming interface for a device in a way that is incompatible, we are required to change the revision ID and/or device ID. > If a device dropped BAR0, that would be a good reason > to bump revision ID. > We don't do this yet. But we have to drop BAR0 to put it behind a PCI express bus, right? If that's the case, then device that's exposed on the PCI express bus must use a different device ID and/or revision ID. That means a new driver is needed in the guest. >> Didn't we run into this problem with the virtio-win drivers with just >> the BAR size changing? > > Because they had a bug: they validated BAR0 size. AFAIK they don't care > what happens with other bars. I think there's a grey area with respect to the assumptions a device can make about the programming interface. But very concretely, we cannot expose virtio-pci-net via PCI express with BAR0 disabled because that will result in existing virtio-pci Linux drivers breaking. > Not we. The BIOS can disable IO BAR: it can do this already > but the device won't be functional. But the only way to expose the device over PCI express is to disable the IO BAR, right? Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Paolo Bonzini writes: > Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto: >> You expect a compiler to pad this structure: >> >> struct foo { >> uint8_t a; >> uint8_t b; >> uint16_t c; >> uint32_t d; >> }; >> >> I'm guessing any compiler that decides to waste memory in this way >> will quickly get dropped by users and then we won't worry >> about building QEMU with it. > > You know the virtio-pci config structures are padded, but not all of > them are. For example, virtio_balloon_stat is not padded and indeed has > an __attribute__((__packed__)) in the spec. Not that these structures are actually used for something. We store the config in these structures so they are actually used for something. The proposed structures only serve as a way to express offsets. You would never actually have a variable of this type. Regards, Anthony Liguori > > For this reason I prefer to have the attribute everywhere. So people > don't have to wonder why it's here and not there. > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote: >> 1) C makes no guarantees about structure layout beyond the first >>member. Yes, if it's naturally aligned or has a packed attribute, >>GCC does the right thing. But this isn't kernel land anymore, >>portability matters and there are more compilers than GCC. > > You expect a compiler to pad this structure: > > struct foo { > uint8_t a; > uint8_t b; > uint16_t c; > uint32_t d; > }; > > I'm guessing any compiler that decides to waste memory in this way > will quickly get dropped by users and then we won't worry > about building QEMU with it. There are other responses in the thread here and I don't really care to bikeshed on this issue. >> Well, given that virtio is widely deployed today, I would think the 1.0 >> standard should strictly reflect what's deployed today, no? >> Any new config layout would be 2.0 material, right? > > Not as it's currently planned. Devices can choose > to support a legacy layout in addition to the new one, > and if you look at the patch you will see that that > is exactly what it does. Adding a new BAR most certainly requires bumping the revision ID or changing the device ID, no? Didn't we run into this problem with the virtio-win drivers with just the BAR size changing? >> Re: the new config layout, I don't think we would want to use it for >> anything but new devices. Forcing a guest driver change > > There's no forcing. > If you look at the patches closely, you will see that > we still support the old layout on BAR0. > > >> is a really big >> deal and I see no reason to do that unless there's a compelling reason >> to. > > There are many a compelling reasons, and they are well known > limitations of virtio PCI: > > - PCI spec compliance (madates device operation with IO memory > disabled). PCI express spec. We are fully compliant with the PCI spec. And what's the user visible advantage of pointing an emulated virtio device behind a PCI-e bus verses a legacy PCI bus? This is a very good example because if we have to disable BAR0, then it's an ABI breaker plan and simple. > - support 64 bit addressing We currently support 44-bit addressing for the ring. While I agree we need to bump it, there's no immediate problem with 44-bit addressing. > - add more than 32 feature bits. > - individually disable queues. > - sanely support cross-endian systems. > - support very small (<1 PAGE) for virtio rings. > - support a separate page for each vq kick. > - make each device place config at flexible offset. None of these things are holding us back today. I'm not saying we shouldn't introduce a new device. But adoption of that device will be slow and realistically will be limited to new devices only. We'll be supporting both devices for a very, very long time. Compatibility is the fundamental value that we provide. We need to go out of our way to make sure that existing guests work and work as well as possible. Sticking virtio devices behind a PCI-e bus just for the hell of it isn't a compelling reason to break existing guests. Regards, Anthony Liguori > Addressing any one of these would cause us to add a substantially new > way to operate virtio devices. > > And since it's a guest change anyway, it seemed like a > good time to do the new layout and fix everything in one go. > > And they are needed like yesterday. > > >> So we're stuck with the 1.0 config layout for a very long time. >> >> Regards, >> >> Anthony Liguori > > Absolutely. This patch let us support both which will allow for > a gradual transition over the next 10 years or so. > >> > reason. I suggest that's 2.0 material... >> > >> > Cheers, >> > Rusty. >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: updated: kvm networking todo wiki
Rusty Russell writes: > "Michael S. Tsirkin" writes: >> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote: >>> "Michael S. Tsirkin" writes: >>> >>> > On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote: >>> >> On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote: >>> >> > Hey guys, >>> >> > I've updated the kvm networking todo wiki with current projects. >>> >> > Will try to keep it up to date more often. >>> >> > Original announcement below. >>> >> >>> >> Thanks a lot. I've added the tasks I'm currently working on to the wiki. >>> >> >>> >> btw. I notice the virtio-net data plane were missed in the wiki. Is the >>> >> project still being considered? >>> > >>> > It might have been interesting several years ago, but now that linux has >>> > vhost-net in kernel, the only point seems to be to >>> > speed up networking on non-linux hosts. >>> >>> Data plane just means having a dedicated thread for virtqueue processing >>> that doesn't hold qemu_mutex. >>> >>> Of course we're going to do this in QEMU. It's a no brainer. But not >>> as a separate device, just as an improvement to the existing userspace >>> virtio-net. >>> >>> > Since non-linux does not have kvm, I doubt virtio is a bottleneck. >>> >>> FWIW, I think what's more interesting is using vhost-net as a networking >>> backend with virtio-net in QEMU being what's guest facing. >>> >>> In theory, this gives you the best of both worlds: QEMU acts as a first >>> line of defense against a malicious guest while still getting the >>> performance advantages of vhost-net (zero-copy). >> >> Great idea, that sounds very intresting. >> >> I'll add it to the wiki. >> >> In fact a bit of complexity in vhost was put there in the vague hope to >> support something like this: virtio rings are not translated through >> regular memory tables, instead, vhost gets a pointer to ring address. >> >> This allows qemu acting as a man in the middle, >> verifying the descriptors but not touching the >> >> Anyone interested in working on such a project? > > It would be an interesting idea if we didn't already have the vhost > model where we don't need the userspace bounce. The model is very interesting for QEMU because then we can use vhost as a backend for other types of network adapters (like vmxnet3 or even e1000). It also helps for things like fault tolerance where we need to be able to control packet flow within QEMU. Regards, Anthony Liguori > We already have two > sets of host side ring code in the kernel (vhost and vringh, though > they're being unified). > > All an accelerator can offer on the tx side is zero copy and direct > update of the used ring. On rx userspace could register the buffers and > the accelerator could fill them and update the used ring. It still > needs to deal with merged buffers, for example. > > You avoid the address translation in the kernel, but I'm not convinced > that's a key problem. > > Cheers, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell writes: > Anthony Liguori writes: >> "Michael S. Tsirkin" writes: >>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>> +return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. > > It is pretty ugly. I think beauty is in the eye of the beholder here... Pretty much every device we have has a switch statement like this. Consistency wins when it comes to qualitative arguments like this. > Yet the structure definitions are descriptive, capturing layout, size > and endianness in natural a format readable by any C programmer. >From an API design point of view, here are the problems I see: 1) C makes no guarantees about structure layout beyond the first member. Yes, if it's naturally aligned or has a packed attribute, GCC does the right thing. But this isn't kernel land anymore, portability matters and there are more compilers than GCC. 2) If we every introduce anything like latching, this doesn't work out so well anymore because it's hard to express in a single C structure the register layout at that point. Perhaps a union could be used but padding may make it a bit challenging. 3) It suspect it's harder to review because a subtle change could more easily have broad impact. If someone changed the type of a field from u32 to u16, it changes the offset of every other field. That's not terribly obvious in the patch itself unless you understand how the structure is used elsewhere. This may not be a problem for virtio because we all understand that the structures are part of an ABI, but if we used this pattern more in QEMU, it would be a lot less obvious. > So AFAICT the question is, do we put the required > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > etc. in the kernel headers or qemu? I'm pretty sure we would end up just having our own integer defines. We carry our own virtio headers today because we can't easily import the kernel headers. >> Haven't looked at the proposed new ring layout yet. > > No change, but there's an open question on whether we should nail it to > little endian (or define the endian by the transport). > > Of course, I can't rule out that the 1.0 standard *may* decide to frob > the ring layout somehow, Well, given that virtio is widely deployed today, I would think the 1.0 standard should strictly reflect what's deployed today, no? Any new config layout would be 2.0 material, right? Re: the new config layout, I don't think we would want to use it for anything but new devices. Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. Regards, Anthony Liguori > but I'd think it would require a compelling > reason. I suggest that's 2.0 material... > > Cheers, > Rusty. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote: >> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, >> > hwaddr addr, >> > } >> > } >> > >> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, >> > + unsigned size) >> > +{ >> > +VirtIOPCIProxy *proxy = opaque; >> > +VirtIODevice *vdev = proxy->vdev; >> > + >> > +uint64_t low = 0xull; >> > + >> > +switch (addr) { >> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >> > +return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. > > Any good reason not to use offsetof? > I see about 138 examples in qemu. There are exactly zero: $ find . -name "*.c" -exec grep -l "case offset" {} \; $ > Anyway, that's the way Rusty wrote it in the kernel header - > I started with defines. > If you convince Rusty to switch I can switch too, We have 300+ devices in QEMU that use #defines. We're not using this kind of thing just because you want to copy code from the kernel. >> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 >> >> And: >> >> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb >> >> Which lets virtio-pci be subclassable and then remaps the config space to >> BAR2. > > > Interesting. Have the spec anywhere? Not yet, but working on that. > You are saying this is going to conflict because > of BAR2 usage, yes? No, this whole thing is flexible. I had to use BAR2 because BAR0 has to be the vram mapping. It also had to be an MMIO bar. The new layout might make it easier to implement a device like this. I shared it mainly because I wanted to show the subclassing idea vs. just tacking an option onto the existing virtio-pci code in QEMU. Regards, Anthony Liguori > So let's only do this virtio-fb only for new layout, so we don't need > to maintain compatibility. In particular, we are working > on making memory BAR access fast for virtio devices > in a generic way. At the moment they are measureably slower > than PIO on x86. > > >> Haven't looked at the proposed new ring layout yet. >> >> Regards, > > No new ring layout. It's new config layout. > > > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > This adds support for new config, and is designed to work with > the new layout code in Rusty's new layout branch. > > At the moment all fields are in the same memory BAR (bar 2). > This will be used to test performance and compare > memory, io and hypercall latency. > > Compiles but does not work yet. > Migration isn't handled yet. > > It's not clear what do queue_enable/queue_disable > fields do, not yet implemented. > > Gateway for config access with config cycles > not yet implemented. > > Sending out for early review/flames. > > Signed-off-by: Michael S. Tsirkin > --- > hw/virtio/virtio-pci.c | 393 > +++-- > hw/virtio/virtio-pci.h | 55 +++ > hw/virtio/virtio.c | 20 +++ > include/hw/virtio/virtio.h | 4 + > 4 files changed, 458 insertions(+), 14 deletions(-) > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index 752991a..f4db224 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy > *proxy) > proxy->ioeventfd_started = false; > } > > +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val) > +{ > +VirtIODevice *vdev = proxy->vdev; > + > +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > +virtio_pci_stop_ioeventfd(proxy); > +} > + > +virtio_set_status(vdev, val & 0xFF); > + > +if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > +virtio_pci_start_ioeventfd(proxy); > +} > + > +if (vdev->status == 0) { > +virtio_reset(proxy->vdev); > +msix_unuse_all_vectors(&proxy->pci_dev); > +} > +} > + > static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) > { > VirtIOPCIProxy *proxy = opaque; > @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t > addr, uint32_t val) > } > break; > case VIRTIO_PCI_STATUS: > -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > -virtio_pci_stop_ioeventfd(proxy); > -} > - > -virtio_set_status(vdev, val & 0xFF); > - > -if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > -virtio_pci_start_ioeventfd(proxy); > -} > - > -if (vdev->status == 0) { > -virtio_reset(proxy->vdev); > -msix_unuse_all_vectors(&proxy->pci_dev); > -} > +virtio_pci_set_status(proxy, val); > > /* Linux before 2.6.34 sets the device as OK without enabling > the PCI device bus master bit. In this case we need to disable > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, > hwaddr addr, > } > } > > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > + unsigned size) > +{ > +VirtIOPCIProxy *proxy = opaque; > +VirtIODevice *vdev = proxy->vdev; > + > +uint64_t low = 0xull; > + > +switch (addr) { > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > +return proxy->device_feature_select; Oh dear no... Please use defines like the rest of QEMU. >From a QEMU pov, take a look at: https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 And: https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb Which lets virtio-pci be subclassable and then remaps the config space to BAR2. Haven't looked at the proposed new ring layout yet. Regards, Anthony Liguori > +case offsetof(struct virtio_pci_common_cfg, device_feature): > +/* TODO: 64-bit features */ > + return proxy->device_feature_select ? 0 : proxy->host_features; > +case offsetof(struct virtio_pci_common_cfg, guest_feature_select): > +return proxy->guest_feature_select; > +case offsetof(struct virtio_pci_common_cfg, guest_feature): > +/* TODO: 64-bit features */ > + return proxy->guest_feature_select ? 0 : vdev->guest_features; > +case offsetof(struct virtio_pci_common_cfg, msix_config): > + return vdev->config_vector; > +case offsetof(struct virtio_pci_common_cfg, num_queues): > +/* TODO: more exact limit? */ > + return VIRTIO_PCI_QUEUE_MAX; > +case offsetof(struct virtio_pci_common_cfg, device_status): > +return vdev->status; > + > + /* About a specific virtqueue. */ > +case offsetof(struct virtio_pci_common_cfg, queue_select): > +return vdev->queue_s
Re: updated: kvm networking todo wiki
"Michael S. Tsirkin" writes: > On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote: >> On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote: >> > Hey guys, >> > I've updated the kvm networking todo wiki with current projects. >> > Will try to keep it up to date more often. >> > Original announcement below. >> >> Thanks a lot. I've added the tasks I'm currently working on to the wiki. >> >> btw. I notice the virtio-net data plane were missed in the wiki. Is the >> project still being considered? > > It might have been interesting several years ago, but now that linux has > vhost-net in kernel, the only point seems to be to > speed up networking on non-linux hosts. Data plane just means having a dedicated thread for virtqueue processing that doesn't hold qemu_mutex. Of course we're going to do this in QEMU. It's a no brainer. But not as a separate device, just as an improvement to the existing userspace virtio-net. > Since non-linux does not have kvm, I doubt virtio is a bottleneck. FWIW, I think what's more interesting is using vhost-net as a networking backend with virtio-net in QEMU being what's guest facing. In theory, this gives you the best of both worlds: QEMU acts as a first line of defense against a malicious guest while still getting the performance advantages of vhost-net (zero-copy). > IMO yet another networking backend is a distraction, > and confusing to users. > In any case, I'd like to see virtio-blk dataplane replace > non dataplane first. We don't want two copies of > virtio-net in qemu. 100% agreed. Regards, Anthony Liguori > >> > >> > >> > I've put up a wiki page with a kvm networking todo list, >> > mainly to avoid effort duplication, but also in the hope >> > to draw attention to what I think we should try addressing >> > in KVM: >> > >> > http://www.linux-kvm.org/page/NetworkingTodo >> > >> > This page could cover all networking related activity in KVM, >> > currently most info is related to virtio-net. >> > >> > Note: if there's no developer listed for an item, >> > this just means I don't know of anyone actively working >> > on an issue at the moment, not that no one intends to. >> > >> > I would appreciate it if others working on one of the items on this list >> > would add their names so we can communicate better. If others like this >> > wiki page, please go ahead and add stuff you are working on if any. >> > >> > It would be especially nice to add autotest projects: >> > there is just a short test matrix and a catch-all >> > 'Cover test matrix with autotest', currently. >> > >> > Currently there are some links to Red Hat bugzilla entries, >> > feel free to add links to other bugzillas. >> > >> > Thanks! >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-21
"Michael S. Tsirkin" writes: > On Tue, May 21, 2013 at 09:29:07AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote: >> >> "Michael S. Tsirkin" writes: >> >> >> >> > On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote: >> >> >> >> >> >> Hi >> >> >> >> >> >> Please, send any topic that you are interested in covering. >> >> >> >> >> >> Thanks, Juan. >> >> > >> >> > Generating acpi tables. >> >> > >> >> > Cc'd a bunch of people who might be interested in this topic. >> >> >> >> Unfortunately I have a conflict this morning so I won't be able to >> >> join. I just saw Kevin's response here from last week and I'll respond >> >> to it later this morning. >> > >> > Unfortunate. >> > Let's talk about this on the next slot: next Tuesday, June 4 then. >> > Could you keep your agenda clear on that day please? >> >> Ack. >> >> Perhaps we could move this call to bimonthly and cancel it less >> frequently? That will make it easier to reserve calendar time for it. > > I think you mean bi-weekly? If yes, ack. I meant twice a month (or every other week). Regards, Anthony Liguori > >> > >> >> Can we post the call for agenda for this call on Fridays in the future? >> >> I need more than 24 hours to make sure to keep my calendar clear... >> >> >> >> Regards, >> >> >> >> Anthony Liguori >> > >> > We don't work on Fridays in Israel so that means we'll only be able to >> > respond Sunday, and you'll only see it Monday anyway. >> > Setting agenda Thursday is probably too aggressive? >> >> Maybe we could use a wiki page to setup a rolling agenda? >> >> Regards, >> >> Anthony Liguori >> >> > >> >> > >> >> > Kevin - could you join on Tuesday? There appears a disconnect >> >> > between the seabios and qemu that a conf call >> >> > might help resolve. >> >> > >> >> > -- >> >> > MST >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> >> > the body of a message to majord...@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-21
"Michael S. Tsirkin" writes: > On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> >> > On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote: >> >> >> >> Hi >> >> >> >> Please, send any topic that you are interested in covering. >> >> >> >> Thanks, Juan. >> > >> > Generating acpi tables. >> > >> > Cc'd a bunch of people who might be interested in this topic. >> >> Unfortunately I have a conflict this morning so I won't be able to >> join. I just saw Kevin's response here from last week and I'll respond >> to it later this morning. > > Unfortunate. > Let's talk about this on the next slot: next Tuesday, June 4 then. > Could you keep your agenda clear on that day please? Ack. Perhaps we could move this call to bimonthly and cancel it less frequently? That will make it easier to reserve calendar time for it. > >> Can we post the call for agenda for this call on Fridays in the future? >> I need more than 24 hours to make sure to keep my calendar clear... >> >> Regards, >> >> Anthony Liguori > > We don't work on Fridays in Israel so that means we'll only be able to > respond Sunday, and you'll only see it Monday anyway. > Setting agenda Thursday is probably too aggressive? Maybe we could use a wiki page to setup a rolling agenda? Regards, Anthony Liguori > >> > >> > Kevin - could you join on Tuesday? There appears a disconnect >> > between the seabios and qemu that a conf call >> > might help resolve. >> > >> > -- >> > MST >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-05-21
"Michael S. Tsirkin" writes: > On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote: >> >> Hi >> >> Please, send any topic that you are interested in covering. >> >> Thanks, Juan. > > Generating acpi tables. > > Cc'd a bunch of people who might be interested in this topic. Unfortunately I have a conflict this morning so I won't be able to join. I just saw Kevin's response here from last week and I'll respond to it later this morning. Can we post the call for agenda for this call on Fridays in the future? I need more than 24 hours to make sure to keep my calendar clear... Regards, Anthony Liguori > > Kevin - could you join on Tuesday? There appears a disconnect > between the seabios and qemu that a conf call > might help resolve. > > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call minutes for 2013-04-23
Eric Blake writes: > On 04/23/2013 08:45 AM, Juan Quintela wrote: >> >> * 1.5 pending patches (paolo) >> anthony thinks nothing big is outstanding >> rdma: not probably for this release, too big change on migration >> cpu-hotplug: andreas expect to get it for 1.5 >> >> >> * What can libvirt expect in 1.5 for introspection of command-line support? >> command extensions? libvirt want then >> * What are the rules for adding optional parameters to existing QMP >> commands? Would it help if we had introspection of QMP commands? >> what are the options that each command support. >> >> command line could work for 1.5 >> if we got patches on the next 2 days we can get it. > > Goal is to provide a QMP command that provides JSON representation of > command line options; I will help review whatever is posted to make sure > we like the interface. Anthony agreed the implementation should be > relatively straightforward and okay to add after soft freeze (but must > be before hard freeze). Libvirt has some code that would like to make > use of the new command-line introspection; Osier will probably be the > first libvirt developer taking advantage of it - if we can swing it, > we'd like libvirt 1.0.5 to use the new command (libvirt freezes this > weekend for a May 2 release). > >> rest of introspection need 1.6 >> it is "challenging" >> we are interesting into feature introspection >> and comand extensions? >> one command to return the schema? > > Anthony was okay with the idea of a full JSON introspection of all QMP > commands, but it is probably too big to squeeze into 1.5 timeframe. > Furthermore, while the command will be useful, we should always be > thinking about API - having to parse through JSON to see if a feature is > present is not always the nicest interface; when adding a new feature, > consider improving an existing query-* or adding a counterpart new > query-* command that makes it much easier to tell if a feature is > available, without having to resort to a QMP introspection. Ack. One of the problems with using schema introspection for feature detection is that there isn't always a 1-1 mapping. You can imagine that we have an optional parameter that gets added to a structure and is initially tied to a specific feature but later gets used by another feature. If a distro backports the later and not the former, but a management tool uses this field to probe for the former feature, it will result in a false positive. That's why a more direct feature negotiation mechanism is better IMHO. Regards, Anthony Liguori > >> if we change a command, how we change the interface without >> changing the c-api? > > c-api is not yet a strong consideration (but see [1] below). Also, > there may be ways to design a C api that is robust to extensions (but > that means designing it into the QMP up front when adding a new > command); there has been some list traffic on this thought. > > More importantly, adding an optional parameter to an existing command is > not okay unless something else is also available to tell whether the > feature is usable - QMP introspection would solve this, but is not > necessarily the most elegant way. For now, while adding QMP > introspection is a good idea, we still want case-by-case review of any > command extensions. > >> >> we can change "drive_mirror" to use a new command to see if there >> are the new features. > > drive-mirror changed in 1.4 to add optional buf-size parameter; right > now, libvirt is forced to limit itself to 1.3 interface (no buf-size or > granularity) because there is no introspection and no query-* command > that witnesses that the feature is present. Idea was that we need to > add a new query-drive-mirror-capabilities (name subject to bikeshedding) > command into 1.5 that would let libvirt know that buf-size/granularity > is usable (done right, it would also prevent the situation of buf-size > being a write-only interface where it is set when starting the mirror > but can not be queried later to see what size is in use). > > Unclear whether anyone was signing up to tackle the addition of a query > command counterpart for drive-mirror in time for 1.5. > >> >> if we have a stable c-api we can do test cases that work. > > Having such a testsuite would make a stable C API more important. > >> >> Eric will complete this with his undrestanding from libvirt point of >> view. > > Also under discussion: the existing QMP 'screendump' command is not > ideal (not extensible, doesn't allow fd passing, hard-coded output
Re: [PATCH-v2 1/2] virtio-scsi: create VirtIOSCSICommon
"Nicholas A. Bellinger" writes: > From: Paolo Bonzini > > This patch refactors existing virtio-scsi code into VirtIOSCSICommon > in order to allow virtio_scsi_init_common() to be used by both internal > virtio_scsi_init() and external vhost-scsi-pci code. > > Changes in Patch-v2: >- Move ->get_features() assignment to virtio_scsi_init() instead of > virtio_scsi_init_common() Any reason we're not doing this as a QOM base class? Similiar to how the in-kernel PIT/PIC work using a common base class... Regards, Anthony Liguori > > Signed-off-by: Paolo Bonzini > Cc: Michael S. Tsirkin > Cc: Asias He > Signed-off-by: Nicholas Bellinger > --- > hw/virtio-scsi.c | 192 > +- > hw/virtio-scsi.h | 130 -- > include/qemu/osdep.h |4 + > 3 files changed, 178 insertions(+), 148 deletions(-) > > diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c > index 8620712..c59e9c6 100644 > --- a/hw/virtio-scsi.c > +++ b/hw/virtio-scsi.c > @@ -18,118 +18,6 @@ > #include > #include > > -#define VIRTIO_SCSI_VQ_SIZE 128 > -#define VIRTIO_SCSI_CDB_SIZE32 > -#define VIRTIO_SCSI_SENSE_SIZE 96 > -#define VIRTIO_SCSI_MAX_CHANNEL 0 > -#define VIRTIO_SCSI_MAX_TARGET 255 > -#define VIRTIO_SCSI_MAX_LUN 16383 > - > -/* Response codes */ > -#define VIRTIO_SCSI_S_OK 0 > -#define VIRTIO_SCSI_S_OVERRUN 1 > -#define VIRTIO_SCSI_S_ABORTED 2 > -#define VIRTIO_SCSI_S_BAD_TARGET 3 > -#define VIRTIO_SCSI_S_RESET4 > -#define VIRTIO_SCSI_S_BUSY 5 > -#define VIRTIO_SCSI_S_TRANSPORT_FAILURE6 > -#define VIRTIO_SCSI_S_TARGET_FAILURE 7 > -#define VIRTIO_SCSI_S_NEXUS_FAILURE8 > -#define VIRTIO_SCSI_S_FAILURE 9 > -#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10 > -#define VIRTIO_SCSI_S_FUNCTION_REJECTED11 > -#define VIRTIO_SCSI_S_INCORRECT_LUN12 > - > -/* Controlq type codes. */ > -#define VIRTIO_SCSI_T_TMF 0 > -#define VIRTIO_SCSI_T_AN_QUERY 1 > -#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2 > - > -/* Valid TMF subtypes. */ > -#define VIRTIO_SCSI_T_TMF_ABORT_TASK 0 > -#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1 > -#define VIRTIO_SCSI_T_TMF_CLEAR_ACA2 > -#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3 > -#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4 > -#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5 > -#define VIRTIO_SCSI_T_TMF_QUERY_TASK 6 > -#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7 > - > -/* Events. */ > -#define VIRTIO_SCSI_T_EVENTS_MISSED0x8000 > -#define VIRTIO_SCSI_T_NO_EVENT 0 > -#define VIRTIO_SCSI_T_TRANSPORT_RESET 1 > -#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2 > -#define VIRTIO_SCSI_T_PARAM_CHANGE 3 > - > -/* Reasons for transport reset event */ > -#define VIRTIO_SCSI_EVT_RESET_HARD 0 > -#define VIRTIO_SCSI_EVT_RESET_RESCAN 1 > -#define VIRTIO_SCSI_EVT_RESET_REMOVED 2 > - > -/* SCSI command request, followed by data-out */ > -typedef struct { > -uint8_t lun[8]; /* Logical Unit Number */ > -uint64_t tag;/* Command identifier */ > -uint8_t task_attr; /* Task attribute */ > -uint8_t prio; > -uint8_t crn; > -uint8_t cdb[]; > -} QEMU_PACKED VirtIOSCSICmdReq; > - > -/* Response, followed by sense data and data-in */ > -typedef struct { > -uint32_t sense_len; /* Sense data length */ > -uint32_t resid; /* Residual bytes in data buffer */ > -uint16_t status_qualifier; /* Status qualifier */ > -uint8_t status; /* Command completion status */ > -uint8_t response;/* Response values */ > -uint8_t sense[]; > -} QEMU_PACKED VirtIOSCSICmdResp; > - > -/* Task Management Request */ > -typedef struct { > -uint32_t type; > -uint32_t subtype; > -uint8_t lun[8]; > -uint64_t tag; > -} QEMU_PACKED VirtIOSCSICtrlTMFReq; > - > -typedef struct { > -uint8_t response; > -} QEMU_PACKED VirtIOSCSICtrlTMFResp; > - > -/* Asynchronous notification query/subscription */ > -typedef struct { > -uint32_t type; > -uint8_t lun[8]; > -uint32_t event_requested; > -} QEMU_PACKED VirtIOSCSICtrlANReq; > - > -typedef struct { > -uint32_t event_actual; > -uint8_t response; > -} QEMU_PACKED VirtIOSCSICtrlANResp; > - > -typedef struct { > -uint32_t
Re: KVH call agenda for 2013-02-05
Juan Quintela writes: > Hi > > Please send in any agenda topics you are interested in. FYI, I have a conflict for today so I won't be able to attend. Regards, Anthony Liguori > > Later, Juan. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 00/22] Multiqueue virtio-net
Applied. Thanks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4 RESEND 00/22] Multiqueue virtio-net
Applied. Thanks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Benjamin Herrenschmidt writes: > On Wed, 2013-01-30 at 17:54 +0100, Andreas Färber wrote: >> >> That would require polymorphism since we already need to derive from >> PCIDevice or ISADevice respectively for interfacing with the bus... >> Modern object-oriented languages have tried to avoid multi-inheritence >> due to arising complications, I thought. Wouldn't object if someone >> wanted to do the dirty implementation work though. ;) >> >> Another such example is EHCI, with PCIDevice and SysBusDevice >> frontends, >> sharing an EHCIState struct and having helper functions operating on >> that core state only. Quite a few device share such a pattern today >> actually (serial, m48t59, ...). > > This is a design bug of your model :-) You shouldn't derive from your > bus interface IMHO but from your functional interface, and have an > ownership relation to the PCIDevice (a bit like IOKit does if my memory > serves me well). Ack. Hence: SerialPCIDevice is-a PCIDevice has-a SerialChipset The board that exports a bus interface is one object. The chipset that implements the functionality is another object. The former's job in life is to map the bus interface to whatever interface the functional object expects. In most cases, this is just a straight forward proxy of a MemoryRegion. Sometimes this involves address shifting, etc. Regards, Anthony Liguori > > Cheers, > Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Benjamin Herrenschmidt writes: > On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote: >> An x86 CPU has a MMIO capability that's essentially 65 bits. Whether >> the top bit is set determines whether it's a "PIO" transaction or an >> "MMIO" transaction. A large chunk of that address space is invalid of >> course. >> >> PCI has a 65 bit address space too. The 65th bit determines whether >> it's an IO transaction or an MMIO transaction. > > This is somewhat an over simplification since IO and MMIO differs in > other ways, such as ordering rules :-) But for the sake of memory > regions decoding I suppose it will do. > >> For architectures that only have a 64-bit address space, what the PCI >> controller typically does is pick a 16-bit window within that address >> space to map to a PCI address with the 65th bit set. > > Sort-of yes. The window doesn't have to be 16-bit (we commonly have > larger IO space windows on powerpc) and there's a window per host > bridge, so there's effectively more than one IO space (as there is more > than one PCI MMIO space, with only a window off the CPU space routed to > each brigde). Ack. > Making a hard wired assumption that the PCI (MMIO and IO) space relates > directly to the CPU bus space is wrong on pretty much all !x86 > architectures. Ack. > > .../... > > You make it sound like substractive decode is a chipset hack. It's not, > it's specified in the PCI spec. It's a hack :-) It's a well specified hack, but it's still a hack. >> 1) A chipset will route any non-positively decoded IO transaction (65th >>bit set) to a single end point (usually the ISA-bridge). Which one it >>chooses is up to the chipset. This is called subtractive decoding >>because the PCI bus will wait multiple cycles for that device to >>claim the transaction before bouncing it. > > This is not a chipset matter. It's the ISA bridge itself that does > substractive decoding. The PCI bus can have one end point that that can be the target for subtractive decoding (not hard decoding, subtractive decoding). IOW, you can only have a single ISA Bridge within a single PCI domain. You are right--chipset is the wrong word. I'm used to thinking in terms of only a single domain :-) > There also exists P2P bridges doing such substractive > decoding, this used to be fairly common with transparent bridges used for > laptop docking. I'm not sure I understand how this would work. How can two devices on the same PCI domain both do subtractive decoding? Indeed, the PCI spec even says: "Subtractive decoding can be implemented by only one device on the bus since it accepts all accesses not positively decoded by some other agent." >> 2) There are special hacks in most PCI chipsets to route very specific >>addresses ranges to certain devices. Namely, legacy VGA IO transactions >>go to the first VGA device. Legacy IDE IO transactions go to the first >>IDE device. This doesn't need to be programmed in the BARs. It will >>just happen. > > This is also mostly not a hack in the chipset. It's a well defined behaviour > for legacy devices, sometimes call hard decoding. Of course often those > devices > are built into the chipset but they don't have to. Plug-in VGA devices will > hard decode legacy VGA regions for both IO and MMIO by default (this can be > disabled on most of them nowadays) for example. This has nothing to do with > the chipset. So I understand what you're saying re: PCI because the devices actually assert DEVSEL to indicate that they handle the transaction. But for PCI-E, doesn't the controller have to expressly identify what the target is? Is this done with the device class? > There's a specific bit in P2P bridge to control the forwarding of legacy > transaction downstream (and VGA palette snoops), this is also fully specified > in the PCI spec. Ack. > >> 3) As it turns out, all legacy PIIX3 devices are positively decoded and >>sent to the ISA-bridge (because it's faster this way). > > Chipsets don't "send to a bridge". It's the bridge itself that > decodes. With PCI... >> Notice the lack of the word "ISA" in all of this other than describing >> the PCI class of an end point. > > ISA is only relevant to the extent that the "legacy" regions of IO space > originate from the original ISA addresses of devices (VGA, IDE, etc...) > and to the extent that an ISA bus might still be present which will get > the transactions that nothing else have decoded in that space. Ack. > >> So h
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Andreas Färber writes: > Am 30.01.2013 17:33, schrieb Anthony Liguori: >> Gerd Hoffmann writes: >> >>>> hw/qxl.c:portio_list_add(qxl_vga_port_list, >>>> pci_address_space_io(dev), 0x3b0); >>>> hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0); >>> >>> That reminds me I should solve this in a more elegant way. >>> >>> qxl takes over the vga io ports. The reason it does this is because qxl >>> switches into vga mode in case the vga ports are accessed while not in >>> vga mode. After doing the check (and possibly switching mode) the vga >>> handler is called to actually handle it. >> >> The best way to handle this would be to remodel how we do VGA. >> >> Make VGACommonState a proper QOM object and use it as the base class for >> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA. > > That would require polymorphism since we already need to derive from > PCIDevice or ISADevice respectively for interfacing with the bus... Nope. You can use composition: QXLDevice is-a VGACommonState QXLPCI is-a PCIDevice has-a QXLDevice > Modern object-oriented languages have tried to avoid multi-inheritence > due to arising complications, I thought. Wouldn't object if someone > wanted to do the dirty implementation work though. ;) There is no need for MI. > Another such example is EHCI, with PCIDevice and SysBusDevice frontends, > sharing an EHCIState struct and having helper functions operating on > that core state only. Quite a few device share such a pattern today > actually (serial, m48t59, ...). Yes, this is all about chipset modelling. Chipsets should derive from device and then be embedded in the appropriate bus device. For instance. SerialState is-a DeviceState ISASerialState is-a ISADevice, has-a SerialState MMIOSerialState is-a SysbusDevice, has-a SerialState This is what we're doing in practice, we just aren't modeling the chipsets and we're open coding the relationships (often in subtley different ways). Regards, Anthony Liguori >> The VGA accessors should be exposed as a memory region but the sub class >> ought to be responsible for actually adding it to a subregion. >> >>> >>> That twist makes it a bit hard to convert vga ... >>> >>> Anyone knows how one would do that with the memory api instead? I think >>> taking over the ports is easy as the memory regions have priorities so I >>> can simply register a region with higher priority. I have no clue how to >>> forward the access to the vga code though. >>> >> >> That should be possible with priorities, but I think it's wrong. There >> aren't two VGA devices. QXL is-a VGA device and the best way to >> override behavior of base VGA device is through polymorphism. > > In this particular case QXL is-a PCI VGA device though, so we can > decouple it from core VGA modeling. Placing the MemoryRegionOps inside > the Class (rather than static const) might be a short-term solution for > overriding read/write handlers of a particular VGA MemoryRegion. :) > > Cheers, > Andreas > >> This isn't really a memory API issue, it's a modeling issue. >> >> Regards, >> >> Anthony Liguori >> >>> Anyone has clues / suggestions? >>> >>> thanks, >>> Gerd > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Gerd Hoffmann writes: > Hi, > >> hw/qxl.c:portio_list_add(qxl_vga_port_list, >> pci_address_space_io(dev), 0x3b0); >> hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0); > > That reminds me I should solve this in a more elegant way. > > qxl takes over the vga io ports. The reason it does this is because qxl > switches into vga mode in case the vga ports are accessed while not in > vga mode. After doing the check (and possibly switching mode) the vga > handler is called to actually handle it. The best way to handle this would be to remodel how we do VGA. Make VGACommonState a proper QOM object and use it as the base class for QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA. The VGA accessors should be exposed as a memory region but the sub class ought to be responsible for actually adding it to a subregion. > > That twist makes it a bit hard to convert vga ... > > Anyone knows how one would do that with the memory api instead? I think > taking over the ports is easy as the memory regions have priorities so I > can simply register a region with higher priority. I have no clue how to > forward the access to the vga code though. > That should be possible with priorities, but I think it's wrong. There aren't two VGA devices. QXL is-a VGA device and the best way to override behavior of base VGA device is through polymorphism. This isn't really a memory API issue, it's a modeling issue. Regards, Anthony Liguori > Anyone has clues / suggestions? > > thanks, > Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Markus Armbruster writes: > Peter Maydell writes: > >> On 30 January 2013 11:39, Andreas Färber wrote: >>> Proposal by hpoussin was to move _list_add() code to ISADevice: >>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html >>> >>> Concerns: >>> * PCI devices (VGA, QXL) register I/O ports as well >>> => above patches add dependency on ISABus to machines >>> -> " no mac ever had one" >>> => PCIDevice shouldn't use ISA API with NULL ISADevice >>> * Lack of avi: Who decides about memory API these days? >>> >>> armbru and agraf concluded that moving this into ISA is wrong. >>> >>> => I will drop the remaining ioport patches from above series. >>> >>> Suggestions on how to proceed with tackling the issue are welcome. >> >> How does this stuff work on real hardware? I would have >> expected that a PCI device registering the fact it has >> IO ports would have to do so via the PCI controller it >> is plugged into... >> >> My naive don't-know-much-about-portio suggestion is that this >> should work the same way as memory regions: each device >> provides portio regions, and the controller for the bus >> (ISA or PCI) exposes those to the next layer up, and >> something at board level maps it all into the right places. > > Makes sense me, but I'm naive, too :) > > For me, "I/O ports" are just an alternate address space some devices > have. For instance, x86 CPUs have an extra pin for selecting I/O > vs. memory address space. The ISA bus has separate read/write pins for > memory and I/O. > > This isn't terribly special. Mapping address spaces around is what > devices bridging buses do. > > I'd expect a system bus for an x86 CPU to have both a memory and an I/O > address space. There is no such thing as a "system bus". There is a bus that links the CPUs to each other and to the North Bridge. This is QPI on modern systems. Sometimes there's a bus to link the North Bridge to the South Bridge. On modern systems, this is QPI. On the i440fx, the i440fx is both the South Bridge and North Bridge and the link between the two is internal to the chip. The South Bridge may then export one or more downstream interfaces. In the i440fx, it only exports PCI. Behind the PCI bus, there may be bridges. On the i440fx, there is a ISA Bridge which also acts as a Super I/O chip. It exposes a downstream ISA bus. sysbus is a relic of poor modeling. A major milestone in QEMU's evolution will be when sysbus is completely removed. Regards, Anthony Liguori > > I'd expect an ISA PC's sysbus - ISA bridge to map both directly. > > I'd expect an ISA bridge for a sysbus without a separate I/O address > space to map the ISA I/O address space into the sysbus's normal address > space somehow. > > PCI ISA bridges have their own rules, but I've gotten away with ignoring > the details so far :) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] QEMU buildbot maintenance state
Gerd Hoffmann writes: > Hi, > >> Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel >> and Christian? It would be awesome if you could do this given your >> experience running and customizing buildbot. > > I'll try to set aside some time for that. Christians idea to host the > config at github is good, that certainly makes it easier to balance > things to more people. > > Another thing which would be helpful: Any chance we can setup a > maintainer tree mirror @ git.qemu.org? A single repository where each > maintainer tree shows up as a branch? I will setup a tree based on the 'T:' fields in MAINTAINERS. So if you want your tree to be part of buildbot, please make sure that you have a correct entry in MAINTAINERS. Regards, Anthony Liguori > > This would make the buildbot setup *alot* easier. We can go for a > AnyBranchScheduler then with BuildFactory and BuildConfig shared, > instead of needing one BuildFactory and BuildConfig per branch. Also > makes the buildbot web interface less cluttered as we don't have a > insane amount of BuildConfigs any more. And saves some resources > (bandwidth + diskspace) for the buildslaves. > > I think people who want to look what is coming or who want to test stuff > cooking it would be a nice service too if they have a one-stop shop > where they can get everything. > > cheers, > Gerd -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] What to do about non-qdevified devices?
Markus Armbruster writes: > Peter Maydell writes: > >> On 30 January 2013 07:02, Markus Armbruster wrote: >>> Anthony Liguori writes: >>> >>> [...] >>>> The problems I ran into were (1) this is a lot of work (2) it basically >>>> requires that all bus children have been qdev/QOM-ified. Even with >>>> something like the ISA bus which is where I started, quite a few devices >>>> were not qdevified still. >>> >>> So what's the plan to complete the qdevification job? Lay really low >>> and quietly hope the problem goes away? We've tried that for about >>> three years, doesn't seem to work. >> >> Do we have a list of not-yet-qdevified devices? Maybe we need to >> start saying "fix X Y and Z or platform P is dropped from the next >> release". (This would of course be easier if we had a way to let users >> know that platform P was in danger...) > > I think that's a good idea. Only problem is identifying pre-qdev > devices in the code requires code inspection (grep won't do, I'm > afraid). > > If we agree on a "qdevify or else" plan, I'd be prepared to help with > the digging up of devices. That's a nice thought, but we're not going to rip out dma.c and break every PC target. But I will help put together a list of devices that need converting. I have patches actually for most of the PC devices. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
Andreas Färber writes: > Am 29.01.2013 16:41, schrieb Juan Quintela: >> * Portio port to new memory regions? >> Andreas, could you fill? > > MemoryRegion's .old_portio mechanism requires workarounds for VGA on > ppc, affecting among others the sPAPR PCI host bridge: > http://git.qemu.org/?p=qemu.git;a=commit;h=a3cfa18eb075c7ef78358ca1956fe7b01caa1724 > > Patches were posted and merged removing all .old_portio users but one: > hw/ioport.c:portio_list_add_1(), used by portio_list_add() > > hw/isa-bus.c:portio_list_add(piolist, isabus->address_space_io, start); > hw/qxl.c:portio_list_add(qxl_vga_port_list, > pci_address_space_io(dev), 0x3b0); > hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0); > hw/vga.c:portio_list_add(vbe_port_list, address_space_io, 0x1ce); > > Proposal by hpoussin was to move _list_add() code to ISADevice: > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html Okay, a couple things here: There is no such thing as "PIO" as a general concept. What leaves the CPU and what a bus interprets are totally different things. An x86 CPU has a MMIO capability that's essentially 65 bits. Whether the top bit is set determines whether it's a "PIO" transaction or an "MMIO" transaction. A large chunk of that address space is invalid of course. PCI has a 65 bit address space too. The 65th bit determines whether it's an IO transaction or an MMIO transaction. For architectures that only have a 64-bit address space, what the PCI controller typically does is pick a 16-bit window within that address space to map to a PCI address with the 65th bit set. Within the PCI bus, transactions are usually routed to devices via positive decoding. The device lists what address regions it wants to handle (via BARs) and the PCI bus uses those to determine who to send transactions to. There are some exceptions though. Specifically: 1) A chipset will route any non-positively decoded IO transaction (65th bit set) to a single end point (usually the ISA-bridge). Which one it chooses is up to the chipset. This is called subtractive decoding because the PCI bus will wait multiple cycles for that device to claim the transaction before bouncing it. 2) There are special hacks in most PCI chipsets to route very specific addresses ranges to certain devices. Namely, legacy VGA IO transactions go to the first VGA device. Legacy IDE IO transactions go to the first IDE device. This doesn't need to be programmed in the BARs. It will just happen. 3) As it turns out, all legacy PIIX3 devices are positively decoded and sent to the ISA-bridge (because it's faster this way). Notice the lack of the word "ISA" in all of this other than describing the PCI class of an end point. So how should this be modeled? On x86, the CPU has a pio address space. That can propagate down through the PCI bus which is what we do today. On !x86, the PCI controller ought to setup a MemoryRegion for downstream PIO that devices can use to register on. We probably need to do something like change the PCI VGA devices to export a MemoryRegion and allow the PCI controller to device how to register that as a subregion. Regards, Anthony Liguori > > Concerns: > * PCI devices (VGA, QXL) register I/O ports as well > => above patches add dependency on ISABus to machines > -> " no mac ever had one" > => PCIDevice shouldn't use ISA API with NULL ISADevice > * Lack of avi: Who decides about memory API these days? > > armbru and agraf concluded that moving this into ISA is wrong. > > => I will drop the remaining ioport patches from above series. > > Suggestions on how to proceed with tackling the issue are welcome. > > Regards, > Andreas > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O
"Michael S. Tsirkin" writes: > On Wed, Jan 30, 2013 at 11:48:14AM +, Peter Maydell wrote: >> On 30 January 2013 11:39, Andreas Färber wrote: >> > Proposal by hpoussin was to move _list_add() code to ISADevice: >> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html >> > >> > Concerns: >> > * PCI devices (VGA, QXL) register I/O ports as well >> > => above patches add dependency on ISABus to machines >> > -> " no mac ever had one" >> > => PCIDevice shouldn't use ISA API with NULL ISADevice >> > * Lack of avi: Who decides about memory API these days? >> > >> > armbru and agraf concluded that moving this into ISA is wrong. >> > >> > => I will drop the remaining ioport patches from above series. >> > >> > Suggestions on how to proceed with tackling the issue are welcome. >> >> How does this stuff work on real hardware? I would have >> expected that a PCI device registering the fact it has >> IO ports would have to do so via the PCI controller it >> is plugged into... > > All programming is done by the OS, devices do not register > with controller. > > Each bridge has two ways to claim an IO transaction: > - transaction is within the window programmed in the bridge > - subtractive decoding enabled and no one else claims the transaction And there can only be one endpoint that accepts subtractive decoding and this is usually the ISA bridge. Also note that there are some really special cases with PCI. The legacy VGA ports are always routed to the first device with a DISPLAY class type. Likewise, with legacy IDE ports are routed to the first device with an IDE class. That's the only reason you can have these legacy devices not behind the ISA bridge. Regards, Anthony Liguori > > At the bus level, transaction happens on a bus and an appropriate device > will claim it. > >> My naive don't-know-much-about-portio suggestion is that this >> should work the same way as memory regions: each device >> provides portio regions, and the controller for the bus >> (ISA or PCI) exposes those to the next layer up, and >> something at board level maps it all into the right places. >> >> -- PMM -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH V2 11/20] tap: support enabling or disabling a queue
"Michael S. Tsirkin" writes: > On Tue, Jan 29, 2013 at 08:10:26PM +, Blue Swirl wrote: >> On Tue, Jan 29, 2013 at 1:50 PM, Jason Wang wrote: >> > On 01/26/2013 03:13 AM, Blue Swirl wrote: >> >> On Fri, Jan 25, 2013 at 10:35 AM, Jason Wang wrote: >> >>> This patch introduce a new bit - enabled in TAPState which tracks >> >>> whether a >> >>> specific queue/fd is enabled. The tap/fd is enabled during >> >>> initialization and >> >>> could be enabled/disabled by tap_enalbe() and tap_disable() which calls >> >>> platform >> >>> specific helpers to do the real work. Polling of a tap fd can only done >> >>> when >> >>> the tap was enabled. >> >>> >> >>> Signed-off-by: Jason Wang >> >>> --- >> >>> include/net/tap.h |2 ++ >> >>> net/tap-win32.c | 10 ++ >> >>> net/tap.c | 43 --- >> >>> 3 files changed, 52 insertions(+), 3 deletions(-) >> >>> >> >>> diff --git a/include/net/tap.h b/include/net/tap.h >> >>> index bb7efb5..0caf8c4 100644 >> >>> --- a/include/net/tap.h >> >>> +++ b/include/net/tap.h >> >>> @@ -35,6 +35,8 @@ int tap_has_vnet_hdr_len(NetClientState *nc, int len); >> >>> void tap_using_vnet_hdr(NetClientState *nc, int using_vnet_hdr); >> >>> void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, >> >>> int ecn, int ufo); >> >>> void tap_set_vnet_hdr_len(NetClientState *nc, int len); >> >>> +int tap_enable(NetClientState *nc); >> >>> +int tap_disable(NetClientState *nc); >> >>> >> >>> int tap_get_fd(NetClientState *nc); >> >>> >> >>> diff --git a/net/tap-win32.c b/net/tap-win32.c >> >>> index 265369c..a2cd94b 100644 >> >>> --- a/net/tap-win32.c >> >>> +++ b/net/tap-win32.c >> >>> @@ -764,3 +764,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int >> >>> len) >> >>> { >> >>> assert(0); >> >>> } >> >>> + >> >>> +int tap_enable(NetClientState *nc) >> >>> +{ >> >>> +assert(0); >> >> abort() >> > >> > This is just to be consistent with the reset of the helpers in this file. >> >> >> >>> +} >> >>> + >> >>> +int tap_disable(NetClientState *nc) >> >>> +{ >> >>> +assert(0); >> >>> +} >> >>> diff --git a/net/tap.c b/net/tap.c >> >>> index 67080f1..95e557b 100644 >> >>> --- a/net/tap.c >> >>> +++ b/net/tap.c >> >>> @@ -59,6 +59,7 @@ typedef struct TAPState { >> >>> unsigned int write_poll : 1; >> >>> unsigned int using_vnet_hdr : 1; >> >>> unsigned int has_ufo: 1; >> >>> +unsigned int enabled : 1; >> >> bool without bit field? >> > >> > Also to be consistent with other field. If you wish I can send patches >> > to convert all those bit field to bool on top of this series. >> >> That would be nice, likewise for the assert(0). > > OK so let's go ahead with this patchset as is, > and a cleanup patch will be send after 1.4 then. Why? I'd prefer that we didn't rush things into 1.4 just because. There's still ample time to respin a corrected series. Regards, Anthony Liguori > > >> > >> > Thanks >> >>> VHostNetState *vhost_net; >> >>> unsigned host_vnet_hdr_len; >> >>> } TAPState; >> >>> @@ -72,9 +73,9 @@ static void tap_writable(void *opaque); >> >>> static void tap_update_fd_handler(TAPState *s) >> >>> { >> >>> qemu_set_fd_handler2(s->fd, >> >>> - s->read_poll ? tap_can_send : NULL, >> >>> - s->read_poll ? tap_send : NULL, >> >>> - s->write_poll ? tap_writable : NULL, >> >>> + s->read_poll && s->enabled ? tap_can_send : >> >>> NULL, >> >>> + s->read_poll && s->enabled ? tap_send : >> >>> NULL, >> >>> +
Re: KVM call minutes 2013-01-29
Alexander Graf writes: > On 01/29/2013 04:41 PM, Juan Quintela wrote: >>Alex will fill this > > When using -device, we can not specify an IRQ line to attach to the > device. This works for some special buses like PCI, but not in the > generic case. We need it generically for virtio-mmio and for potential > platform assigned vfio devices though. > > The conclusion we came up with was that in order to model IRQ lines > between arbitrary devices, we should use QOM and the QOM name space. > Details are up for Anthony to fill in :). Oh good :-) This is how far I got since I last touched this problem. https://github.com/aliguori/qemu/commits/qom-pin.4 qemu_irq is basically foreign to QOM/qdev. There are two things I did to solve this. The first is to have a stateful Pin object. Stateful is important because qemu_irq is totally broken wrt reset and live migration as it stands today. It's pretty easy to have a Pin object that can "connect" to a qemu_irq source or sink which means we can incrementally refactor by first converting each device under a bus to using Pins (using the qemu_irq connect interface to maintain compat) until the bus controller can be converted to export Pins allowing a full switch to using Pins only for that bus. The problems I ran into were (1) this is a lot of work (2) it basically requires that all bus children have been qdev/QOM-ified. Even with something like the ISA bus which is where I started, quite a few devices were not qdevified still. I'm not going to be able to work on this in the foreseeable future, but if someone wants to take it over, I'd be happy to provide advice. I'm also open to other approaches that require less refactoring but I honestly don't know that there is a way to avoid the heavy lifting. Regards, Anthony Liguori > > > Alex > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call minutes 2013-01-29
Paolo Bonzini writes: > Il 29/01/2013 16:41, Juan Quintela ha scritto: >> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the >> future. (stefan) >> >> Add checks for fd's bigger than 1024? multifunction devices uses lot >> of fd's for device. >> >> Portability? >> Use glib? and let it use poll underneath. >> slirp is a problem. >> in the end loop: moving to a glib event loop, how we arrive there is the >> discussion. > > We can use g_poll while keeping the main-loop.c wrappers around the glib > event loop. Both slirp and iohandler.c access the fd_sets randomly, so > we need to remember some state between the fill and poll functions. We > can use two main-loop.c functions: > > int qemu_add_poll_fd(int fd, int events); > > select: writes the events into three fd_sets, returns the file > descriptor itself > > poll: writes a GPollFD into a dynamically-sized array (of GPollFDs) > and returns the index in the array. > > int qemu_get_poll_fd_revents(int index); > > select: takes the file descriptor (returned by qemu_add_poll_fd), > makes up revents based on the three fd_sets > > poll: takes the index into the array and returns the corresponding > revents > > iohandler.c can simply store the index into struct IOHandlerRecord, and > use it later. slirp can do the same for struct socket. > > The select code can be kept for Windows after POSIX switches to poll. Doesn't g_poll already do this under the covers for Windows? Regards, Anthony Liguori > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-01-29
Juan Quintela writes: > Hi > > Please send in any agenda topics you are interested in. - Outstanding virtio work for 1.4 - Multiqueue virtio-net (Amos/Michael) - Refactorings (Fred/Peter) - virtio-ccw (Cornelia/Alex) We need to work out the ordering here and what's reasonable to merge over the next week. Regards, Anthony Liguori > > Later, Juan. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 16/19] target-ppc: Refactor debug output macros
Andreas Färber writes: > Make debug output compile-testable even if disabled. > > Inline DEBUG_OP check in excp_helper.c. > Inline LOG_MMU_STATE() in mmu_helper.c. > Inline PPC_DEBUG_SPR check in translate_init.c. > > Signed-off-by: Andreas Färber > --- > target-ppc/excp_helper.c| 22 +++ > target-ppc/kvm.c|9 ++- > target-ppc/mem_helper.c |2 -- > target-ppc/mmu_helper.c | 63 > +-- > target-ppc/translate.c | 12 - > target-ppc/translate_init.c | 10 +++ > 6 Dateien geändert, 55 Zeilen hinzugefügt(+), 63 Zeilen entfernt(-) > > diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c > index 0a1ac86..54722c4 100644 > --- a/target-ppc/excp_helper.c > +++ b/target-ppc/excp_helper.c > @@ -21,14 +21,14 @@ > > #include "helper_regs.h" > > -//#define DEBUG_OP > -//#define DEBUG_EXCEPTIONS > +#define DEBUG_OP 0 > +#define DEBUG_EXCEPTIONS 0 > > -#ifdef DEBUG_EXCEPTIONS > -# define LOG_EXCP(...) qemu_log(__VA_ARGS__) > -#else > -# define LOG_EXCP(...) do { } while (0) > -#endif > +#define LOG_EXCP(...) G_STMT_START \ > +if (DEBUG_EXCEPTIONS) { \ > +qemu_log(__VA_ARGS__); \ > +} \ > +G_STMT_END Just thinking out loud a bit.. This form becomes pretty common and it's ashame to use a macro here if we don't have to. I think: static inline void LOG_EXCP(const char *fmt, ...) { if (debug_exceptions) { va_list ap; va_start(ap, fmt); qemu_logv(fmt, ap); va_end(ap); } } Probably would have equivalent performance. debug_exception would be read-mostly and ought to be very predictable as a result. I strongly expect that the compiler would actually inline LOG_EXCP too. I see LOG_EXCP and LOG_DIS in this series. Perhaps we could just introduce these functions and then make these flags run-time controllable? BTW, one advantage of this over your original proposal back to your point is that you still won't catch linker errors with your proposal. Dead code eliminate will kill off those branches before the linker ever sees them. Regards, Anthony Liguori > > > /*/ > /* PowerPC Hypercall emulation */ > @@ -777,7 +777,7 @@ void ppc_hw_interrupt(CPUPPCState *env) > } > #endif /* !CONFIG_USER_ONLY */ > > -#if defined(DEBUG_OP) > +#ifndef CONFIG_USER_ONLY > static void cpu_dump_rfi(target_ulong RA, target_ulong msr) > { > qemu_log("Return from exception at " TARGET_FMT_lx " with flags " > @@ -835,9 +835,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong > nip, target_ulong msr, > /* XXX: beware: this is false if VLE is supported */ > env->nip = nip & ~((target_ulong)0x0003); > hreg_store_msr(env, msr, 1); > -#if defined(DEBUG_OP) > -cpu_dump_rfi(env->nip, env->msr); > -#endif > +if (DEBUG_OP) { > +cpu_dump_rfi(env->nip, env->msr); > +} > /* No need to raise an exception here, > * as rfi is always the last insn of a TB > */ > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 2f4f068..0dc6657 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -37,15 +37,10 @@ > #include "hw/spapr.h" > #include "hw/spapr_vio.h" > > -//#define DEBUG_KVM > +#define DEBUG_KVM 0 > > -#ifdef DEBUG_KVM > #define dprintf(fmt, ...) \ > -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0) > -#else > -#define dprintf(fmt, ...) \ > -do { } while (0) > -#endif > +do { if (DEBUG_KVM) { fprintf(stderr, fmt, ## __VA_ARGS__); } } while (0) > > #define PROC_DEVTREE_CPU "/proc/device-tree/cpus/" > > diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c > index 902b1cd..5c7a5ce 100644 > --- a/target-ppc/mem_helper.c > +++ b/target-ppc/mem_helper.c > @@ -26,8 +26,6 @@ > #include "exec/softmmu_exec.h" > #endif /* !defined(CONFIG_USER_ONLY) */ > > -//#define DEBUG_OP > - > > /*/ > /* Memory load and stores */ > > diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c > index ee168f1..9340fbb 100644 > --- a/target-ppc/mmu_helper.c > +++ b/target-ppc/mmu_helper.c > @@ -21,39 +21,36 @@ > #include "sysemu/kvm.h" > #include "kvm_ppc.h" > > -//#define DEBUG_MMU > -//#define DEBUG_BATS > -//#define DEBUG_SLB > -//#define DEBUG_SOFTWARE_TLB > +#define DEBUG_MMU 0 > +#define DEBUG_BATS 0 > +#define DEBUG_SLB
Re: [PATCH v6 00/11] s390: channel I/O support in qemu.
Hi, Thank you for submitting your patch series. checkpatch.pl has detected that one or more of the patches in this series violate the QEMU coding style. If you believe this message was sent in error, please ignore it or respond here with an explanation. Otherwise, please correct the coding style issues and resubmit a new version of the patch. For more information about QEMU coding style, see: http://git.qemu.org/?p=qemu.git;a=blob_plain;f=CODING_STYLE;hb=HEAD Here is the output from checkpatch.pl: Subject: s390: Add s390-ccw-virtio machine. Subject: s390: Add default support for SCLP console ERROR: do not initialise statics to 0 or NULL #72: FILE: vl.c:2468: +static int index = 0; WARNING: braces {} are necessary for all arms of this statement #126: FILE: vl.c:3923: +if (default_sclp) [...] WARNING: braces {} are necessary for all arms of this statement #135: FILE: vl.c:3937: +if (default_sclp) [...] WARNING: braces {} are necessary for all arms of this statement #144: FILE: vl.c:4109: +if (foreach_device_config(DEV_SCLP, sclp_parse) < 0) [...] total: 1 errors, 3 warnings, 114 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Subject: s390-virtio: Factor out some initialization code. Subject: s390: Add new channel I/O based virtio transport. Subject: s390: Wire up channel I/O in kvm. Subject: s390: Virtual channel subsystem support. ERROR: need consistent spacing around '*' (ctx:WxV) #56: FILE: hw/s390x/css.c:31: +SubchDev *sch[MAX_SCHID + 1]; ^ ERROR: need consistent spacing around '*' (ctx:WxV) #62: FILE: hw/s390x/css.c:37: +SubchSet *sch_set[MAX_SSID + 1]; ^ ERROR: need consistent spacing around '*' (ctx:WxV) #74: FILE: hw/s390x/css.c:49: +CssImage *css[MAX_CSSID + 1]; ^ total: 3 errors, 0 warnings, 1469 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. Subject: s390: Add channel I/O instructions. Subject: s390: I/O interrupt and machine check injection. Subject: s390: Channel I/O basic definitions. Subject: s390: Add mapping helper functions. Subject: s390: Lowcore mapping helper. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target
"Nicholas A. Bellinger" writes: > Hi MST & Co, > > On Thu, 2013-01-17 at 18:43 +0200, Michael S. Tsirkin wrote: >> On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote: >> > From: Nicholas Bellinger >> > >> > Hello Anthony & Co, >> > >> > This is the fourth installment to add host virtualized target support for >> > the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU >> > 1.3.0-rc. >> > >> > The series is available directly from the following git branch: >> > >> >git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git >> > vhost-scsi-for-1.3 >> > >> > Note the code is cut against yesterday's QEMU head, and dispite the name >> > of the tree is based upon mainline qemu.org git code + has thus far been >> > running overnight with > 100K IOPs small block 4k workloads using v3.6-rc2+ >> > based target code with RAMDISK_DR backstores. >> > >> > Other than some minor fuzz between jumping from QEMU 1.2.0 -> 1.2.50, this >> > series is functionally identical to what's been posted for vhost-scsi >> > RFC-v3 >> > to qemu-devel. >> > >> > Please consider applying these patches for an initial vhost-scsi merge into >> > QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed >> > for >> > this series to in order to merge. >> > >> > Thank you! >> > >> > --nab >> >> OK what's the status here? >> We missed 1.3 but let's try not to miss 1.4? >> > > Unfortunately, I've not been able to get back to the conversion > requested by Paolo for a standalone vhost-scsi PCI device. Is your git repo above up to date? Perhaps I can find someone to help out.. > At this point my hands are still full with iSER-target for-3.9 kernel > code over the next weeks. > > What's the v1.4 feature cut-off looking like at this point..? Hard freeze is on february 1st but 1.5 opens up again on the 15th. So the release windows shouldn't have a major impact on merging... Regards, Anthony Liguori > > --nab > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net
"Michael S. Tsirkin" writes: > On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote: >> Jason Wang writes: >> >> > On 01/15/2013 03:44 AM, Anthony Liguori wrote: >> >> Jason Wang writes: >> >> >> >>> Hello all: >> >>> >> >>> This seires is an update of last version of multiqueue virtio-net >> >>> support. >> >>> >> >>> Recently, linux tap gets multiqueue support. This series implements basic >> >>> support for multiqueue tap, nic and vhost. Then use it as an >> >>> infrastructure to >> >>> enable the multiqueue support for virtio-net. >> >>> >> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but >> >>> userspace could be get much benefits since dataplane like parallized >> >>> mechanism >> >>> were not implemented. >> >>> >> >>> User could start a multiqueue virtio-net card through adding a "queues" >> >>> parameter to tap. >> >>> >> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device >> >>> virtio-net-pci,netdev=hn0 >> >>> >> >>> Management tools such as libvirt can pass multiple pre-created fds >> >>> through >> >>> >> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device >> >>> virtio-net-pci,netdev=hn0 >> >> I'm confused/frightened that this syntax works. You shouldn't be >> >> allowed to have two values for the same property. Better to have a >> >> syntax like fd[0]=X,fd[1]=Y or something along those lines. >> > >> > Yes, but this what current a StringList type works for command line. >> > Some other parameters such as dnssearch, hostfwd and guestfwd have >> > already worked in this way. Looks like your suggestions need some >> > extension on QemuOps visitor, maybe we can do this on top. >> >> It's a silly syntax and breaks compatibility. This is valid syntax: >> >> -net tap,fd=3,fd=4 >> >> In this case, it means 'fd=4' because the last fd overwrites the first >> one. >> >> Now you've changed it to mean something else. Having one thing mean >> something in one context, but something else in another context is >> terrible interface design. >> >> Regards, >> >> Anthony Liguori > > Aha so just renaming the field 'fds' would address this issue? No, you still have the problem of different meanings. -netdev tap,fd=X,fd=Y -netdev tap,fds=X,fds=Y Would have wildly different behavior. Just do: -netdev tap,fds=X:Y And then we're staying consistent wrt the interpretation of multiple properties of the same name. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/12] Multiqueue virtio-net
Jason Wang writes: > On 01/15/2013 03:44 AM, Anthony Liguori wrote: >> Jason Wang writes: >> >>> Hello all: >>> >>> This seires is an update of last version of multiqueue virtio-net support. >>> >>> Recently, linux tap gets multiqueue support. This series implements basic >>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure >>> to >>> enable the multiqueue support for virtio-net. >>> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but >>> userspace could be get much benefits since dataplane like parallized >>> mechanism >>> were not implemented. >>> >>> User could start a multiqueue virtio-net card through adding a "queues" >>> parameter to tap. >>> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device >>> virtio-net-pci,netdev=hn0 >>> >>> Management tools such as libvirt can pass multiple pre-created fds through >>> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device >>> virtio-net-pci,netdev=hn0 >> I'm confused/frightened that this syntax works. You shouldn't be >> allowed to have two values for the same property. Better to have a >> syntax like fd[0]=X,fd[1]=Y or something along those lines. > > Yes, but this what current a StringList type works for command line. > Some other parameters such as dnssearch, hostfwd and guestfwd have > already worked in this way. Looks like your suggestions need some > extension on QemuOps visitor, maybe we can do this on top. It's a silly syntax and breaks compatibility. This is valid syntax: -net tap,fd=3,fd=4 In this case, it means 'fd=4' because the last fd overwrites the first one. Now you've changed it to mean something else. Having one thing mean something in one context, but something else in another context is terrible interface design. Regards, Anthony Liguori > > Thanks >> >> Regards, >> >> Anthony Liguori >> >>> You can fetch and try the code from: >>> git://github.com/jasowang/qemu.git >>> >>> Patch 1 adds a generic method of creating multiqueue taps and implement the >>> linux part. >>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic >>> emulation codes to support multiqueue. >>> Patch 5 introduces multiqueue support for qemu networking code: each peers >>> of >>> NetClientState were abstracted as a queue. Though this, most of the codes >>> could >>> be reusued without change. >>> Patch 6 adds basic multiqueue support for vhost which could let vhost just >>> handle a subset of all virtqueues. >>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue >>> virtio-net. >>> Patch 9-12 implement the multiqueue support of virtio-net >>> >>> Changes from RFC v2: >>> - rebase the codes to latest qemu >>> - align the multiqueue virtio-net implementation to virtio spec >>> - split the patches into more smaller patches >>> - set_link and hotplug support >>> >>> Changes from RFC V1: >>> - rebase to the latest >>> - fix memory leak in parse_netdev >>> - fix guest notifiers assignment/de-assignment >>> - changes the command lines to: >>>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2 >>> >>> Reference: >>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html >>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481 >>> >>> Perf Numbers: >>> >>> Two Intel Xeon 5620 with direct connected intel 82599EB >>> Host/Guest kernel: David net tree >>> vhost enabled >>> >>> - lots of improvents of both latency and cpu utilization in request-reponse >>> test >>> - get regression of guest sending small packets which because TCP tends to >>> batch >>> less when the latency were improved >>> >>> 1q/2q/4q >>> TCP_RR >>> size #sessions trans.rate norm trans.rate norm trans.rate norm >>> 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 >>> 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 >>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 >>> 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 >>> 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 >>> 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 >>> 64 50 106966
Re: [PATCH 00/12] Multiqueue virtio-net
Jason Wang writes: > Hello all: > > This seires is an update of last version of multiqueue virtio-net support. > > Recently, linux tap gets multiqueue support. This series implements basic > support for multiqueue tap, nic and vhost. Then use it as an infrastructure to > enable the multiqueue support for virtio-net. > > Both vhost and userspace multiqueue were implemented for virtio-net, but > userspace could be get much benefits since dataplane like parallized mechanism > were not implemented. > > User could start a multiqueue virtio-net card through adding a "queues" > parameter to tap. > > ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0 > > Management tools such as libvirt can pass multiple pre-created fds through > > ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device > virtio-net-pci,netdev=hn0 I'm confused/frightened that this syntax works. You shouldn't be allowed to have two values for the same property. Better to have a syntax like fd[0]=X,fd[1]=Y or something along those lines. Regards, Anthony Liguori > > You can fetch and try the code from: > git://github.com/jasowang/qemu.git > > Patch 1 adds a generic method of creating multiqueue taps and implement the > linux part. > Patch 2 - 4 introduce some helpers which could be used to refactor the nic > emulation codes to support multiqueue. > Patch 5 introduces multiqueue support for qemu networking code: each peers of > NetClientState were abstracted as a queue. Though this, most of the codes > could > be reusued without change. > Patch 6 adds basic multiqueue support for vhost which could let vhost just > handle a subset of all virtqueues. > Patch 7-8 introduce new helpers of virtio which is needed by multiqueue > virtio-net. > Patch 9-12 implement the multiqueue support of virtio-net > > Changes from RFC v2: > - rebase the codes to latest qemu > - align the multiqueue virtio-net implementation to virtio spec > - split the patches into more smaller patches > - set_link and hotplug support > > Changes from RFC V1: > - rebase to the latest > - fix memory leak in parse_netdev > - fix guest notifiers assignment/de-assignment > - changes the command lines to: >qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2 > > Reference: > v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html > v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481 > > Perf Numbers: > > Two Intel Xeon 5620 with direct connected intel 82599EB > Host/Guest kernel: David net tree > vhost enabled > > - lots of improvents of both latency and cpu utilization in request-reponse > test > - get regression of guest sending small packets which because TCP tends to > batch > less when the latency were improved > > 1q/2q/4q > TCP_RR > size #sessions trans.rate norm trans.rate norm trans.rate norm > 1 1 9393.26 595.64 9408.18 597.34 9375.19 584.12 > 1 2072162.1 2214.24 129880.22 2456.13 196949.81 2298.13 > 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57 > 1 100 126734.63 2676.54 145553.5 2406.63 265252.68 2943 > 64 19453.42 632.33 9371.37 616.13 9338.19 615.97 > 64 20 70620.03 2093.68 125155.75 2409.15 191239.91 2253.32 > 64 50 1069662448.29 146518.67 2514.47 242134.07 2720.91 > 64 100 117046.35 2394.56 190153.09 2696.82 238881.29 2704.41 > 256 1 8733.29 736.36 8701.07 680.83 8608.92 530.1 > 256 20 69279.89 2274.45 115103.07 2299.76 144555.16 1963.53 > 256 50 97676.02 2296.09 150719.57 2522.92 254510.5 3028.44 > 256 100 150221.55 2949.56 197569.3 2790.92 300695.78 3494.83 > TCP_CRR > size #sessions trans.rate norm trans.rate norm trans.rate norm > 1 1 2848.37 163.41 2230.39 130.89 2013.09 120.47 > 1 2023434.5 562.11 31057.43 531.07 49488.28 564.41 > 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97 > 1 100 28827.22 584.73 48813.25 661.6 61783.62 676.56 > 64 12780.08 159.4 2201.07 127.96 2006.8 117.63 > 64 20 23318.51 564.47 30982.44 530.24 49734.95 566.13 > 64 50 28585.72 582.54 40576.7 610.08 60167.89 656.56 > 64 100 28747.37 584.17 49081.87 667.87 60612.94 662 > 256 1 2772.08 160.51 2231.84 131.05 2003.62 113.45 > 256 20 23086.35 559.8 30929.09 528.16 48454.9 555.22 > 256 50 28354.7 579.85 40578.31 60760261.71 657.87 > 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72 > TCP_STREAM guest receiving > size #sessions throughput norm throughput norm throughput norm > 1 1 16.27 1.33 16.11.12 16.13 0.99 > 1 2 33.04 2.08 32.96 2.19 32.75 1.98 > 1 4 66.62 6.83 68.35.56 66.14 2.65 > 64 1896.55 56.67 914.02 58.
Re: [PULL 0/2] vfio-pci: Fixes for 1.4 & stable
Pulled, thanks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] target-i386: make "enforce" flag work as it should
Hi, This is an automated message generated from the QEMU Patches. Thank you for submitting this patch. This patch no longer applies to qemu.git. This may have occurred due to: 1) Changes in mainline requiring your patch to be rebased and re-tested. 2) Sending the mail using a tool other than git-send-email. Please use git-send-email to send patches to QEMU. 3) Basing this patch off of a branch that isn't tracking the QEMU master branch. If that was done purposefully, please include the name of the tree in the subject line in the future to prevent this message. For instance: "[PATCH block-next 1/10] qcow3: add fancy new feature" 4) You no longer wish for this patch to be applied to QEMU. No additional action is required on your part. Nacked-by: QEMU Patches Below is the output from git-am: Applying: target-i386: kvm: -cpu host: Use GET_SUPPORTED_CPUID for SVM features Applying: target-i386: kvm: Enable all supported KVM features for -cpu host Applying: target-i386: check/enforce: Fix CPUID leaf numbers on error messages fatal: sha1 information is lacking or useless (target-i386/cpu.c). Repository lacks necessary blobs to fall back on 3-way merge. Cannot fall back to three-way merge. Patch failed at 0003 target-i386: check/enforce: Fix CPUID leaf numbers on error messages The copy of the patch that failed is found in: /home/aliguori/.patches/git-working/.git/rebase-apply/patch When you have resolved this problem run "git am --resolved". If you would prefer to skip this patch, instead run "git am --skip". To restore the original branch and stop patching run "git am --abort". -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue
Gleb Natapov writes: > The following changes since commit e376a788ae130454ad5e797f60cb70d0308babb6: > > Merge remote-tracking branch 'kwolf/for-anthony' into staging (2012-12-13 > 14:32:28 -0600) > > are available in the git repository at: > > > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master > > for you to fetch changes up to 0a2a59d35cbabf63c91340a1c62038e3e60538c1: > > qemu-kvm/pci-assign: 64 bits bar emulation (2012-12-25 14:37:52 +0200) > Pulled. Thanks. Regards, Anthony Liguori > > Will Auld (1): > target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs > > Xudong Hao (1): > qemu-kvm/pci-assign: 64 bits bar emulation > > hw/kvm/pci-assign.c | 14 ++ > target-i386/cpu.h |2 ++ > target-i386/kvm.c | 14 ++ > target-i386/machine.c | 21 + > 4 files changed, 47 insertions(+), 4 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2012-12-18
Juan Quintela writes: > Hi > > Please send in any agenda topics that you have. I have a conflicting call today so I can't attend. Regards, Anthony Liguori > > Thanks, Juan. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2012-12-11
Kevin Wolf writes: > Am 10.12.2012 14:59, schrieb Juan Quintela: >> >> Hi >> >> Please send in any agenda topics you are interested in. > > Can probably be answered on the list, but what is the status of > libqos? Still on my TODO list. Regards, Anthony Liguori > > Kevin > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Alter steal time reporting in KVM
Glauber Costa writes: > Hi, > > On 11/27/2012 12:36 AM, Michael Wolf wrote: >> In the case of where you have a system that is running in a >> capped or overcommitted environment the user may see steal time >> being reported in accounting tools such as top or vmstat. This can >> cause confusion for the end user. To ease the confusion this patch set >> adds the idea of consigned (expected steal) time. The host will separate >> the consigned time from the steal time. The consignment limit passed to the >> host will be the amount of steal time expected within a fixed period of >> time. Any other steal time accruing during that period will show as the >> traditional steal time. > > If you submit this again, please include a version number in your series. > > It would also be helpful to include a small changelog about what changed > between last version and this version, so we could focus on that. > > As for the rest, I answered your previous two submissions saying I don't > agree with the concept. If you hadn't changed anything, resending it > won't change my mind. > > I could of course, be mistaken or misguided. But I had also not seen any > wave of support in favor of this previously, so basically I have no new > data to make me believe I should see it any differently. > > Let's try this again: > > * Rik asked you in your last submission how does ppc handle this. You > said, and I quote: "In the case of lpar on POWER systems they simply > report steal time and do not alter it in any way. > They do however report how much processor is assigned to the partition > and that information is in /proc/ppc64/lparcfg." This only is helpful for static entitlements. But if we allow dynamic entitlements--which is a very useful feature, think buying an online "upgrade" in a cloud environment--then you need to account for entitlement loss at the same place where you do the rest of the accounting: in /proc/stat. > Now, that is a *way* more sensible thing to do. Much more. "Confusing > users" is something extremely subjective. This is specially true about > concepts that are know for quite some time, like steal time. If you out > of a sudden change the meaning of this, it is sure to confuse a lot more > users than it would clarify. I'll bring you a nice bottle of scotch at the next KVM Forum if you can find me one user that can accurately describe what steal time is. The semantics are so incredibly subtle that I have a hard time believing anyone actually understands what it means today. Regards, Anthony Liguori > > > > > >> >> --- >> >> Michael Wolf (5): >> Alter the amount of steal time reported by the guest. >> Expand the steal time msr to also contain the consigned time. >> Add the code to send the consigned time from the host to the guest >> Add a timer to allow the separation of consigned from steal time. >> Add an ioctl to communicate the consign limit to the host. >> >> >> arch/x86/include/asm/kvm_host.h | 11 +++ >> arch/x86/include/asm/kvm_para.h |3 +- >> arch/x86/include/asm/paravirt.h |4 +-- >> arch/x86/include/asm/paravirt_types.h |2 + >> arch/x86/kernel/kvm.c |8 ++--- >> arch/x86/kernel/paravirt.c|4 +-- >> arch/x86/kvm/x86.c| 50 >> - >> fs/proc/stat.c|9 +- >> include/linux/kernel_stat.h |2 + >> include/linux/kvm_host.h |2 + >> include/uapi/linux/kvm.h |2 + >> kernel/sched/core.c | 10 ++- >> kernel/sched/cputime.c| 21 +- >> kernel/sched/sched.h |2 + >> virt/kvm/kvm_main.c |7 + >> 15 files changed, 120 insertions(+), 17 deletions(-) >> > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] [PULL] qemu-kvm.git uq/master queue
Marcelo Tosatti writes: > The following changes since commit 1ccbc2851282564308f790753d7158487b6af8e2: > > qemu-sockets: Fix parsing of the inet option 'to'. (2012-11-21 12:07:59 > +0400) > > are available in the git repository at: > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master > > Bruce Rogers (1): > Legacy qemu-kvm options have no argument Pulled. Thanks. Regards, Anthony Liguori > > qemu-options.hx |8 > 1 files changed, 4 insertions(+), 4 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PULL 0/3] vfio-pci for 1.3-rc0
Alex Williamson writes: > Hi Anthony, > > Please pull the tag below. I posted the linux-headers update > separately on Oct-15; since it hasn't been applied and should be > non-controversial, I include it again here. Thanks, > > Alex > Pulled. Thanks. Regards, Anthony Liguori > The following changes since commit f5022a135e4309a54d433c69b2a056756b2d0d6b: > > aio: fix aio_ctx_prepare with idle bottom halves (2012-11-12 20:02:09 +0400) > > are available in the git repository at: > > git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-1.3.0-rc0 > > for you to fetch changes up to a771c51703cf9f91023c6570426258bdf5ec775b: > > vfio-pci: Use common msi_get_message (2012-11-13 12:27:40 -0700) > > > vfio-pci: KVM INTx accel & common msi_get_message > > > Alex Williamson (3): > linux-headers: Update to 3.7-rc5 > vfio-pci: Add KVM INTx acceleration > vfio-pci: Use common msi_get_message > > hw/vfio_pci.c| 210 > +++ > linux-headers/asm-powerpc/kvm_para.h | 6 +- > linux-headers/asm-s390/kvm_para.h| 8 +- > linux-headers/asm-x86/kvm.h | 17 +++ > linux-headers/linux/kvm.h| 25 - > linux-headers/linux/kvm_para.h | 6 +- > linux-headers/linux/vfio.h | 6 +- > linux-headers/linux/virtio_config.h | 6 +- > linux-headers/linux/virtio_ring.h| 6 +- > 9 files changed, 241 insertions(+), 49 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2012-11-12
Marcelo Tosatti writes: > On Mon, Nov 12, 2012 at 01:58:38PM +0100, Juan Quintela wrote: >> >> Hi >> >> Please send in any agenda topics you are interested in. >> >> Later, Juan. > > It would be good to have a status report on qemu-kvm compatibility > (the remaining TODO items are with Anthony). They are: > > - qemu-kvm 1.2 machine type. > - default accelerator being KVM. > > Note migration will remain broken due to > > https://patchwork.kernel.org/patch/1674521/ > > BTW, this can be via email, if preferred (i cannot attend the call). Let's cancel the call and I'll spend the hour writing up the patches and sending them out. Regards, Anthony Liguori > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 1.1.1 -> 1.1.2 migrate /managedsave issue
Avi Kivity writes: > On 10/22/2012 09:04 AM, Philipp Hahn wrote: >> Hello Doug, >> >> On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote: >>> I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs. >> ... >>> I had upgraded to qemu-kvm 1.1.2 >> ... >>> qemu: warning: error while loading state for instance 0x0 of device 'ram' >>> load of migration failed >> >> That error can be from many things. For me it was that the PXE-ROM images >> for >> the network cards were updated as well. Their size changed over the next >> power-of-two size, so kvm needed to allocate less/more memory and changed >> some PCI configuration registers, where the size of the ROM region is stored. >> On loading the saved state those sizes were compared and failed to validate. >> KVM then aborts loading the saved state with that little helpful message. >> >> So you might want to check, if your case is similar to mine. >> >> I diagnosed that using gdb to single step kvm until I found >> hw/pci.c#get_pci_config_device() returning -EINVAL. >> > > Seems reasonable. Doug, please verify to see if it's the same issue or > another one. > > Juan, how can we fix this? It's clear that the option ROM size has to > be fixed and not change whenever the blob is updated. This will fix it > for future releases. But what to do about the ones in the field? This is not a problem upstream because we don't alter the ROMs. If we did, we would keep the old ROMs around and set the romfile property in the compatible machine. This is what distros that are shipping ROMs outside of QEMU ought to do. It's a bug to unconditionally change the ROMs (in a guest visible way) without adding compatibility support. Regards, Anthony Liguori > > -- > error compiling committee.c: too many arguments to function > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/28] [PULL] qemu-kvm.git uq/master queue
Marcelo Tosatti writes: > The following changes since commit aee0bf7d8d7564f8f2c40e4501695c492b7dd8d1: > > tap-win32: stubs to fix win32 build (2012-10-30 19:18:53 +) > > are available in the git repository at: > git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master > > Don Slutz (1): > target-i386: Add missing kvm cpuid feature name > > Eduardo Habkost (19): > i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for > loop > i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check > i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable > i386: kvm: extract register switch to cpuid_entry_get_reg() function > i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function > i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() > function > i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single > 'if' > i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid() > i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on > kvm_arch_get_supported_cpuid() > i386: kvm: x2apic is not supported without in-kernel irqchip > i386: kvm: mask cpuid_kvm_features earlier > i386: kvm: mask cpuid_ext4_features bits earlier > i386: kvm: filter CPUID feature words earlier, on cpu.c > i386: kvm: reformat filter_features_for_kvm() code > i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too > i386: cpu: add missing CPUID[EAX=7,ECX=0] flag names > target-i386: make cpu_x86_fill_host() void > target-i386: cpu: make -cpu host/check/enforce code KVM-specific > target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID > > Jan Kiszka (6): > Use machine options to emulate -no-kvm-irqchip > Issue warning when deprecated -no-kvm-pit is used > Use global properties to emulate -no-kvm-pit-reinjection > Issue warning when deprecated drive parameter boot=on|off is used > Issue warning when deprecated -tdf option is used > Emulate qemu-kvms -no-kvm option > > Marcelo Tosatti (1): > cirrus_vga: allow configurable vram size > > Peter Maydell (1): > update-linux-headers.sh: Handle new kernel uapi/ directories > Pulled. Thanks. Regards, Anthony Liguori > blockdev.c |6 ++ > hw/cirrus_vga.c | 21 -- > kvm.h |1 + > qemu-config.c |4 + > qemu-options.hx | 16 > scripts/update-linux-headers.sh |3 +- > target-i386/cpu.c | 98 +++--- > target-i386/kvm.c | 153 > +++ > vl.c| 33 + > 9 files changed, 242 insertions(+), 93 deletions(-) > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT
could then detect that it is running >> on an old kernel and fall back to the old format. >> >> The HPT entry format is very unlikely to change in size or basic >> layout (though the architects do redefine some of the bits >> occasionally). > > I meant the internal data structure that holds HPT entries. > > I guess I don't understand the index. Do we expect changes to be in > contiguous ranges? And invalid entries to be contiguous as well? That > doesn't fit with how hash tables work. Does the index represent the > position of the entry within the table, or something else? > > >> >>> > + >>> > +Writes to the fd create HPT entries starting at the index given in the >>> > +header; first `n_valid' valid entries with contents from the data >>> > +written, then `n_invalid' invalid entries, invalidating any previously >>> > +valid entries found. >>> >>> This scheme is a clever, original, and very interesting approach to live >>> migration. That doesn't necessarily mean a NAK, we should see if it >>> makes sense for other migration APIs as well (we currently have >>> difficulties migrating very large/wide guests). >>> >>> What is the typical number of entries in the HPT? Do you have estimates >>> of the change rate? >> >> Typically the HPT would have about a million entries, i.e. it would be >> 16MiB in size. The usual guideline is to make it about 1/64 of the >> maximum amount of RAM the guest could ever have, rounded up to a power >> of two, although we often run with less, say 1/128 or even 1/256. > > 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE. Does > it warrant a live migration protocol? 0.15 sec == 150ms. The typical downtime window is 30ms. So yeah, I think it does. >> Because it is a hash table, updates tend to be scattered throughout >> the whole table, which is another reason why per-page dirty tracking >> and updates would be pretty inefficient. > > This suggests a stream format that includes the index in every entry. > >> >> As for the change rate, it depends on the application of course, but >> basically every time the guest changes a PTE in its Linux page tables >> we do the corresponding change to the corresponding HPT entry, so the >> rate can be quite high. Workloads that do a lot of fork, exit, mmap, >> exec, etc. have a high rate of HPT updates. > > If the rate is high enough, then there's no point in a live update. Do we have practical data here? Regards, Anthony Liguori > >> >>> Suppose new hardware arrives that supports nesting HPTs, so that kvm is >>> no longer synchronously aware of the guest HPT (similar to how NPT/EPT >>> made kvm unaware of guest virtual->physical translations on x86). How >>> will we deal with that? But I guess this will be a >>> non-guest-transparent and non-userspace-transparent change, unlike >>> NPT/EPT, so a userspace ABI addition will be needed anyway). >> >> Nested HPTs or other changes to the MMU architecture would certainly >> need new guest kernels and new support in KVM. With a nested >> approach, the guest-side MMU data structures (HPT or whatever) would >> presumably be in guest memory and thus be handled along with all the >> other guest memory, while the host-side MMU data structures would not >> need to be saved, so from the migration point of view that would make >> it all a lot simpler. > > Yeah. > > > -- > error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: Update Linux headers
Alex Williamson writes: > Based on v3.7-rc1-3-g29bb4cc Normally this would go through qemu-kvm/uq/master but since this is from Linus' tree, it's less of a concern. Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git snapshot. Regards, Anthony Liguori > > Signed-off-by: Alex Williamson > --- > > Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci > > linux-headers/asm-x86/kvm.h | 17 + > linux-headers/linux/kvm.h | 25 + > linux-headers/linux/kvm_para.h |6 +++--- > linux-headers/linux/vfio.h |6 +++--- > linux-headers/linux/virtio_config.h |6 +++--- > linux-headers/linux/virtio_ring.h |6 +++--- > 6 files changed, 50 insertions(+), 16 deletions(-) > > diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h > index 246617e..a65ec29 100644 > --- a/linux-headers/asm-x86/kvm.h > +++ b/linux-headers/asm-x86/kvm.h > @@ -9,6 +9,22 @@ > #include > #include > > +#define DE_VECTOR 0 > +#define DB_VECTOR 1 > +#define BP_VECTOR 3 > +#define OF_VECTOR 4 > +#define BR_VECTOR 5 > +#define UD_VECTOR 6 > +#define NM_VECTOR 7 > +#define DF_VECTOR 8 > +#define TS_VECTOR 10 > +#define NP_VECTOR 11 > +#define SS_VECTOR 12 > +#define GP_VECTOR 13 > +#define PF_VECTOR 14 > +#define MF_VECTOR 16 > +#define MC_VECTOR 18 > + > /* Select x86 specific features in */ > #define __KVM_HAVE_PIT > #define __KVM_HAVE_IOAPIC > @@ -25,6 +41,7 @@ > #define __KVM_HAVE_DEBUGREGS > #define __KVM_HAVE_XSAVE > #define __KVM_HAVE_XCRS > +#define __KVM_HAVE_READONLY_MEM > > /* Architectural interrupt line count. */ > #define KVM_NR_INTERRUPTS 256 > diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h > index 4b9e575..81d2feb 100644 > --- a/linux-headers/linux/kvm.h > +++ b/linux-headers/linux/kvm.h > @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region { > __u64 userspace_addr; /* start of the userspace allocated memory */ > }; > > -/* for kvm_memory_region::flags */ > -#define KVM_MEM_LOG_DIRTY_PAGES 1UL > -#define KVM_MEMSLOT_INVALID (1UL << 1) > +/* > + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace, > + * other bits are reserved for kvm internal use which are defined in > + * include/linux/kvm_host.h. > + */ > +#define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) > +#define KVM_MEM_READONLY (1UL << 1) > > /* for KVM_IRQ_LINE */ > struct kvm_irq_level { > @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > #define KVM_CAP_S390_COW 79 > #define KVM_CAP_PPC_ALLOC_HTAB 80 > +#ifdef __KVM_HAVE_READONLY_MEM > +#define KVM_CAP_READONLY_MEM 81 > +#endif > +#define KVM_CAP_IRQFD_RESAMPLE 82 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config { > #endif > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > +/* > + * Available with KVM_CAP_IRQFD_RESAMPLE > + * > + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies > + * the irqfd to operate in resampling mode for level triggered interrupt > + * emlation. See Documentation/virtual/kvm/api.txt. > + */ > +#define KVM_IRQFD_FLAG_RESAMPLE (1 << 1) > > struct kvm_irqfd { > __u32 fd; > __u32 gsi; > __u32 flags; > - __u8 pad[20]; > + __u32 resamplefd; > + __u8 pad[16]; > }; > > struct kvm_clock_data { > diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h > index 7bdcf93..cea2c5c 100644 > --- a/linux-headers/linux/kvm_para.h > +++ b/linux-headers/linux/kvm_para.h > @@ -1,5 +1,5 @@ > -#ifndef __LINUX_KVM_PARA_H > -#define __LINUX_KVM_PARA_H > +#ifndef _UAPI__LINUX_KVM_PARA_H > +#define _UAPI__LINUX_KVM_PARA_H > > /* > * This header file provides a method for making a hypercall to the host > @@ -25,4 +25,4 @@ > */ > #include > > -#endif /* __LINUX_KVM_PARA_H */ > +#endif /* _UAPI__LINUX_KVM_PARA_H */ > diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h > index f787b72..4758d1b 100644 > --- a/linux-headers/linux/vfio.h > +++ b/linux-headers/linux/vfio.h > @@ -8,8 +8,8 @@ > * it under the terms of the GNU General Public License version 2 as > * published by the Free Software Foundation. > */ > -#ifndef VFIO_H > -#define VFIO_H > +#ifndef _UAPIVFIO_H > +#define _UAPIVFIO_H > > #include > #include > @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap { > > #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14) > > -#endi
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > Gerd Hoffmann writes: >> So how about this: >> >> (1) Add a vendor specific pci capability for new-style virtio. >> Specifies the pci bar used for new-style virtio registers. >> Guests can use it to figure whenever new-style virtio is >> supported and to map the correct bar (which will probably >> be bar 1 in most cases). > > This was closer to the original proposal[1], which I really liked (you > can layout bars however you want). Anthony thought that vendor > capabilities were a PCI-e feature, but it seems they're blessed in PCI > 2.3. 2.3 was standardized in 2002. Are we confident that vendor extensions play nice with pre-2.3 OSes like Win2k, WinXP, etc? I still think it's a bad idea to rely on something so "new" in something as fundamental as virtio-pci unless we have to. Regards, Anthony Liguori > > So let's return to that proposal, giving something like this: > > /* IDs for different capabilities. Must all exist. */ > /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */ > /* Common configuration */ > #define VIRTIO_PCI_CAP_COMMON_CFG 1 > /* Notifications */ > #define VIRTIO_PCI_CAP_NOTIFY_CFG 2 > /* ISR access */ > #define VIRTIO_PCI_CAP_ISR_CFG3 > /* Device specific confiuration */ > #define VIRTIO_PCI_CAP_DEVICE_CFG 4 > > /* This is the PCI capability header: */ > struct virtio_pci_cap { > u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */ > u8 cap_next;/* Generic PCI field: next ptr. */ > u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */ > u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */ > u8 bar; /* Where to find it. */ > u8 unused; > __le16 offset; /* Offset within bar. */ > __le32 length; /* Length. */ > }; > > This means qemu can point the isr_cfg into the legacy area if it wants. > In fact, it can put everything in BAR0 if it wants. > > Thoughts? > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Gerd Hoffmann writes: > Hi, > >>> Well, we also want to clean up the registers, so how about: >>> >>> BAR0: legacy, as is. If you access this, don't use the others. > > Ok. > >>> BAR1: new format virtio-pci layout. If you use this, don't use BAR0. >>> BAR2: virtio-cfg. If you use this, don't use BAR0. > > Why use two bars for this? You can put them into one mmio bar, together > with the msi-x vector table and PBA. Of course a pci capability > describing the location is helpful for that ;) You don't need a capability. You can also just add a "config offset" field to the register set and then make the semantics that it occurs in the same region. > >>> BAR3: ISR. If you use this, don't use BAR0. > > Again, I wouldn't hardcode that but use a capability. > >>> I prefer the cases exclusive (ie. use one or the other) as a clear path >>> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with >>> an ugly corner case in future (ISR is BAR0 + 19? WTF?). > > Ok, so we have four register sets: > > (1) legacy layout > (2) new virtio-pci > (3) new virtio-config > (4) new virtio-isr > > We can have a vendor pci capability, with a dword for each register set: > > bit 31-- present bit > bits 26-24 -- bar > bits 23-0 -- offset > > So current drivers which must support legacy can use this: > > legacy layout -- present, bar 0, offset 0 > new virtio-pci-- present, bar 1, offset 0 > new virtio-config -- present, bar 1, offset 256 > new virtio-isr-- present, bar 0, offset 19 > > [ For completeness: msi-x capability could add this: ] > > msi-x vector tablebar 1, offset 512 > msi-x pba bar 1, offset 768 > >> We'll never remove legacy so we shouldn't plan on it. There are >> literally hundreds of thousands of VMs out there with the current virtio >> drivers installed in them. We'll be supporting them for a very, very >> long time :-) > > But new devices (virtio-qxl being a candidate) don't have old guests and > don't need to worry. > > They could use this if they care about fast isr: > > legacy layout -- not present > new virtio-pci-- present, bar 1, offset 0 > new virtio-config -- present, bar 1, offset 256 > new virtio-isr-- present, bar 0, offset 0 > > Or this if they don't worry about isr performance: > > legacy layout -- not present > new virtio-pci-- present, bar 0, offset 0 > new virtio-config -- present, bar 0, offset 256 > new virtio-isr-- not present > >> I don't think we gain a lot by moving the ISR into a separate BAR. >> Splitting up registers like that seems weird to me too. > > Main advantage of defining a register set with just isr is that it > reduces pio address space consumtion for new virtio devices which don't > have to worry about the legacy layout (8 bytes which is minimum size for > io bars instead of 64 bytes). Doing some rough math, we should have at least 16k of PIO space. That let's us have well over 500 virtio-pci devices with the current register layout. I don't think we're at risk of running out of space... >> If we added an additional constraints that BAR1 was mirrored except for > > Why add constraints? We want something future-proof, don't we? > >>> The detection is simple: if BAR1 has non-zero length, it's new-style, >>> otherwise legacy. > > Doesn't fly. BAR1 is in use today for MSI-X support. But the location is specified via capabilities so we can change the location to be within BAR1 at a non-conflicting offset. >> I agree that this is the best way to extend, but I think we should still >> use a transport feature bit. We want to be able to detect within QEMU >> whether a guest is using these new features because we need to adjust >> migration state accordingly. > > Why does migration need adjustments? Because there is additional state in the "new" layout. We need to understand whether a guest is relying on that state or not. For instance, extended virtio features. If a guest is in the process of reading extended virtio features, it may not have changed any state but we must ensure that we don't migrate to an older verison of QEMU w/o the extended virtio features. This cannot be handled by subsections today because there is no guest written state that's affected. Regards, Anthony Liguori > > [ Not that I want veto a feature bit, but I don't see the need yet ] > > cheers, > Gerd > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Avi Kivity writes: > On 10/09/2012 05:16 AM, Rusty Russell wrote: >> Anthony Liguori writes: >>> We'll never remove legacy so we shouldn't plan on it. There are >>> literally hundreds of thousands of VMs out there with the current virtio >>> drivers installed in them. We'll be supporting them for a very, very >>> long time :-) >> >> You will be supporting this for qemu on x86, sure. As I think we're >> still in the growth phase for virtio, I prioritize future spec >> cleanliness pretty high. > > If a pure ppc hypervisor was on the table, this might have been > worthwhile. As it is the codebase is shared, and the Linux drivers are > shared, so cleaning up the spec doesn't help the code. Note that distros have been (perhaps unknowingly) shipping virtio-pci for PPC for some time now. So even though there wasn't a hypervisor that supported virtio-pci, the guests already support it and are out there in the wild. There's a lot of value in maintaining "legacy" support even for PPC. >> But I think you'll be surprised how fast this is deprecated: >> 1) Bigger queues for block devices (guest-specified ringsize) >> 2) Smaller rings for openbios (guest-specified alignment) >> 3) All-mmio mode (powerpc) >> 4) Whatever network features get numbers > 31. >> >>> I don't think we gain a lot by moving the ISR into a separate BAR. >>> Splitting up registers like that seems weird to me too. >> >> Confused. I proposed the same split as you have, just ISR by itself. > > I believe Anthony objects to having the ISR by itself. What is the > motivation for that? Right, BARs are a precious resource not to be spent lightly. Having an entire BAR dedicated to a 1-byte register seems like a waste to me. Regards, Anthony Liguori > > > -- > error compiling committee.c: too many arguments to function > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > Anthony Liguori writes: >> We'll never remove legacy so we shouldn't plan on it. There are >> literally hundreds of thousands of VMs out there with the current virtio >> drivers installed in them. We'll be supporting them for a very, very >> long time :-) > > You will be supporting this for qemu on x86, sure. And PPC. > As I think we're > still in the growth phase for virtio, I prioritize future spec > cleanliness pretty high. > > But I think you'll be surprised how fast this is deprecated: > 1) Bigger queues for block devices (guest-specified ringsize) > 2) Smaller rings for openbios (guest-specified alignment) > 3) All-mmio mode (powerpc) > 4) Whatever network features get numbers > 31. We can do all of these things with incremental change to the existing layout. That's the only way what I'm suggesting is different. You want to reorder all of the fields and have a driver flag day. But I strongly suspect we'll decide we need to do the same exercise again in 4 years when we now need to figure out how to take advantage of transactional memory or some other whiz-bang hardware feature. There are a finite number of BARs but each BAR has an almost infinite size. So extending BARs instead of introducing new one seems like the conservative approach moving forward. >> I don't think we gain a lot by moving the ISR into a separate BAR. >> Splitting up registers like that seems weird to me too. > > Confused. I proposed the same split as you have, just ISR by itself. I disagree with moving the ISR into a separate BAR. That's what seems weird to me. >> It's very normal to have a mirrored set of registers that are PIO in one >> bar and MMIO in a different BAR. >> >> If we added an additional constraints that BAR1 was mirrored except for >> the config space and the MSI section was always there, I think the end >> result would be nice. IOW: > > But it won't be the same, because we want all that extra stuff, like > more feature bits and queue size alignment. (Admittedly queues past > 16TB aren't a killer feature). > > To make it concrete: > > Current: > struct { > __le32 host_features; /* read-only */ > __le32 guest_features; /* read/write */ > __le32 queue_pfn; /* read/write */ > __le16 queue_size; /* read-only */ > __le16 queue_sel; /* read/write */ > __le16 queue_notify;/* read/write */ > u8 status; /* read/write */ > u8 isr; /* read-only, clear on read */ > /* Optional */ > __le16 msi_config_vector; /* read/write */ > __le16 msi_queue_vector;/* read/write */ > /* ... device features */ > }; > > Proposed: > struct virtio_pci_cfg { > /* About the whole device. */ > __le32 device_feature_select; /* read-write */ > __le32 device_feature; /* read-only */ > __le32 guest_feature_select;/* read-write */ > __le32 guest_feature; /* read-only */ > __le16 msix_config; /* read-write */ > __u8 device_status; /* read-write */ > __u8 unused; > > /* About a specific virtqueue. */ > __le16 queue_select;/* read-write */ > __le16 queue_align; /* read-write, power of 2. */ > __le16 queue_size; /* read-write, power of 2. */ > __le16 queue_msix_vector;/* read-write */ > __le64 queue_address; /* read-write: 0x == DNE. */ > }; > > struct virtio_pci_isr { > __u8 isr; /* read-only, clear on read */ > }; What I'm suggesting is: > struct { > __le32 host_features; /* read-only */ > __le32 guest_features; /* read/write */ > __le32 queue_pfn; /* read/write */ > __le16 queue_size; /* read-only */ > __le16 queue_sel; /* read/write */ > __le16 queue_notify;/* read/write */ > u8 status; /* read/write */ > u8 isr; /* read-only, clear on read */ > __le16 msi_config_vector; /* read/write */ > __le16 msi_queue_vector;/* read/write */ > __le32 host_feature_select; /* read/write */ > __le32 guest_feature_select;/* read/write */ > __le32 queue_pfn_hi;/* read/write */ > }; > With the additional semantic that the virtio-config space is overlayed on top of the register set in BAR0 unless the VIRTIO_PCI_F_SEPARATE_CONFIG feature is acknowledged. This feature acts as a latch and when set, removes the config space overlay.
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > Anthony Liguori writes: >> Gerd Hoffmann writes: >> >>> Hi, >>> >>>>> So we could have for virtio something like this: >>>>> >>>>> Capabilities: [??] virtio-regs: >>>>> legacy: BAR=0 offset=0 >>>>> virtio-pci: BAR=1 offset=1000 >>>>> virtio-cfg: BAR=1 offset=1800 >>>> >>>> This would be a vendor specific PCI capability so lspci wouldn't >>>> automatically know how to parse it. >>> >>> Sure, would need a patch to actually parse+print the cap, >>> /me was just trying to make my point clear in a simple way. >>> >>>>>>> 2) ISTR an argument about mapping the ISR register separately, for >>>>>>>performance, but I can't find a reference to it. >>>>>> >>>>>> I think the rationale is that ISR really needs to be PIO but everything >>>>>> else doesn't. PIO is much faster on x86 because it doesn't require >>>>>> walking page tables or instruction emulation to handle the exit. >>>>> >>>>> Is this still a pressing issue? With MSI-X enabled ISR isn't needed, >>>>> correct? Which would imply that pretty much only old guests without >>>>> MSI-X support need this, and we don't need to worry that much when >>>>> designing something new ... >>>> >>>> It wasn't that long ago that MSI-X wasn't supported.. I think we should >>>> continue to keep ISR as PIO as it is a fast path. >>> >>> No problem if we allow to have both legacy layout and new layout at the >>> same time. Guests can continue to use ISR @ BAR0 in PIO space for >>> existing virtio devices, even in case they want use mmio for other >>> registers -> all fine. >>> >>> New virtio devices can support MSI-X from day one and decide to not >>> expose a legacy layout PIO bar. >> >> I think having BAR1 be an MMIO mirror of the registers + a BAR2 for >> virtio configuration space is probably not that bad of a solution. > > Well, we also want to clean up the registers, so how about: > > BAR0: legacy, as is. If you access this, don't use the others. > BAR1: new format virtio-pci layout. If you use this, don't use BAR0. > BAR2: virtio-cfg. If you use this, don't use BAR0. > BAR3: ISR. If you use this, don't use BAR0. > > I prefer the cases exclusive (ie. use one or the other) as a clear path > to remove the legacy layout; and leaving the ISR in BAR0 leaves us with > an ugly corner case in future (ISR is BAR0 + 19? WTF?). We'll never remove legacy so we shouldn't plan on it. There are literally hundreds of thousands of VMs out there with the current virtio drivers installed in them. We'll be supporting them for a very, very long time :-) I don't think we gain a lot by moving the ISR into a separate BAR. Splitting up registers like that seems weird to me too. It's very normal to have a mirrored set of registers that are PIO in one bar and MMIO in a different BAR. If we added an additional constraints that BAR1 was mirrored except for the config space and the MSI section was always there, I think the end result would be nice. IOW: BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config BAR1[mmio]: virtio-pci registers + MSI section + future extensions BAR2[mmio]: virtio-config We can continue to do ISR access via BAR0 for performance reasons. > As to MMIO vs PIO, the BARs are self-describing, so we should explicitly > endorse that and leave it to the devices. > > The detection is simple: if BAR1 has non-zero length, it's new-style, > otherwise legacy. I agree that this is the best way to extend, but I think we should still use a transport feature bit. We want to be able to detect within QEMU whether a guest is using these new features because we need to adjust migration state accordingly. Otherwise we would have to detect reads/writes to the new BARs to maintain whether the extended register state needs to be saved. This gets nasty dealing with things like reset. A feature bit simplifies this all pretty well. Regards, Anthony Liguori > > Thoughts? > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Gerd Hoffmann writes: > Hi, > >>> So we could have for virtio something like this: >>> >>> Capabilities: [??] virtio-regs: >>> legacy: BAR=0 offset=0 >>> virtio-pci: BAR=1 offset=1000 >>> virtio-cfg: BAR=1 offset=1800 >> >> This would be a vendor specific PCI capability so lspci wouldn't >> automatically know how to parse it. > > Sure, would need a patch to actually parse+print the cap, > /me was just trying to make my point clear in a simple way. > >>>>> 2) ISTR an argument about mapping the ISR register separately, for >>>>>performance, but I can't find a reference to it. >>>> >>>> I think the rationale is that ISR really needs to be PIO but everything >>>> else doesn't. PIO is much faster on x86 because it doesn't require >>>> walking page tables or instruction emulation to handle the exit. >>> >>> Is this still a pressing issue? With MSI-X enabled ISR isn't needed, >>> correct? Which would imply that pretty much only old guests without >>> MSI-X support need this, and we don't need to worry that much when >>> designing something new ... >> >> It wasn't that long ago that MSI-X wasn't supported.. I think we should >> continue to keep ISR as PIO as it is a fast path. > > No problem if we allow to have both legacy layout and new layout at the > same time. Guests can continue to use ISR @ BAR0 in PIO space for > existing virtio devices, even in case they want use mmio for other > registers -> all fine. > > New virtio devices can support MSI-X from day one and decide to not > expose a legacy layout PIO bar. I think having BAR1 be an MMIO mirror of the registers + a BAR2 for virtio configuration space is probably not that bad of a solution. Regards, Anthony Liguori > > cheers, > Gerd > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Gerd Hoffmann writes: > Hi, > >> But I think we could solve this in a different way. I think we could >> just move the virtio configuration space to BAR1 by using a transport >> feature bit. > > Why hard-code stuff? > > I think it makes alot of sense to have a capability simliar to msi-x > which simply specifies bar and offset of the register sets: > > [root@fedora ~]# lspci -vvs4 > 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device > [ ... ] > Region 0: I/O ports at c000 [size=64] > Region 1: Memory at fc029000 (32-bit) [size=4K] > Capabilities: [40] MSI-X: Enable+ Count=2 Masked- > Vector table: BAR=1 offset= > PBA: BAR=1 offset=0800 MSI-X capability is a standard PCI capability which is why lspci can parse it. > > So we could have for virtio something like this: > > Capabilities: [??] virtio-regs: > legacy: BAR=0 offset=0 > virtio-pci: BAR=1 offset=1000 > virtio-cfg: BAR=1 offset=1800 This would be a vendor specific PCI capability so lspci wouldn't automatically know how to parse it. You could just as well teach lspci to parse BAR0 to figure out what features are supported. >> That then frees up the entire BAR0 for use as virtio-pci registers. We >> can then always include the virtio-pci MSI-X register space and >> introduce all new virtio-pci registers as simply being appended. > > BAR0 needs to stay as-is for compatibility reasons. New devices which > don't have to care about old guests don't need to provide a 'legacy' > register region. A latch feature bit would allow the format to change without impacting compatibility at all. >>> 2) ISTR an argument about mapping the ISR register separately, for >>>performance, but I can't find a reference to it. >> >> I think the rationale is that ISR really needs to be PIO but everything >> else doesn't. PIO is much faster on x86 because it doesn't require >> walking page tables or instruction emulation to handle the exit. > > Is this still a pressing issue? With MSI-X enabled ISR isn't needed, > correct? Which would imply that pretty much only old guests without > MSI-X support need this, and we don't need to worry that much when > designing something new ... It wasn't that long ago that MSI-X wasn't supported.. I think we should continue to keep ISR as PIO as it is a fast path. Regards, Anthony Liguori > > cheers, > Gerd > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Using PCI config space to indicate config location
Rusty Russell writes: > (Topic updated, cc's trimmed). > > Anthony Liguori writes: >> Rusty Russell writes: >>> 4) The only significant change to the spec is that we use PCI >>>capabilities, so we can have infinite feature bits. >>>(see >>> http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html) >> >> We discussed this on IRC last night. I don't think PCI capabilites are >> a good mechanism to use... >> >> PCI capabilities are there to organize how the PCI config space is >> allocated to allow vendor extensions to co-exist with future PCI >> extensions. >> >> But we've never used the PCI config space within virtio-pci. We do >> everything in BAR0. I don't think there's any real advantage of using >> the config space vs. a BAR for virtio-pci. > > Note before anyone gets confused; we were talking about using the PCI > config space to indicate what BAR(s) the virtio stuff is in. An > alternative would be to simply specify a new layout format in BAR1. > > The arguments for a more flexible format that I know of: > > 1) virtio-pci has already extended the pci-specific part of the >configuration once (for MSI-X), so I don't want to assume it won't >happen again. "configuration" is the wrong word here. The virtio-pci BAR0 layout is: 0..19 virtio-pci registers 20+ virtio configuration space MSI-X needed to add additional virtio-pci registers, so now we have: 0..19 virtio-pci registers if MSI-X: 20..23 virtio-pci MSI-X registers 24+ virtio configuration space else: 20+ virtio configuration space I agree, this stinks. But I think we could solve this in a different way. I think we could just move the virtio configuration space to BAR1 by using a transport feature bit. That then frees up the entire BAR0 for use as virtio-pci registers. We can then always include the virtio-pci MSI-X register space and introduce all new virtio-pci registers as simply being appended. This new feature bit then becomes essentially a virtio configuration latch. When unacked, virtio configuration hides new registers, when acked, those new registers are exposed. Another option is to simply put new registers after the virtio configuration blob. > 2) ISTR an argument about mapping the ISR register separately, for >performance, but I can't find a reference to it. I think the rationale is that ISR really needs to be PIO but everything else doesn't. PIO is much faster on x86 because it doesn't require walking page tables or instruction emulation to handle the exit. The argument to move the remaining registers to MMIO is to allow 64-bit accesses to registers which isn't possible with PIO. >> This maps really nicely to non-PCI transports too. > > This isn't right. Noone else can use the PCI layout. While parts are > common, other parts are pci-specific (MSI-X and ISR for example), and > yet other parts are specified by PCI elsewhere (eg interrupt numbers). > >> But extending the >> PCI config space (especially dealing with capability allocation) is >> pretty gnarly and there isn't an obvious equivalent outside of PCI. > > That's OK, because general changes should be done with feature bits, and > the others all have an infinite number. Being the first, virtio-pci has > some unique limitations we'd like to fix. > >> There are very devices that we emulate today that make use of extended >> PCI device registers outside the platform devices (that have no BARs). > > This sentence confused me? There is a missing "few". "There are very few devices..." Extending the PCI configuration space is unusual for PCI devices. That was the point. Regards, Anthony Liguori > > Thanks, > Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html