Re: Who signed gemu-1.7.1.tar.bz2?

2014-04-23 Thread Anthony Liguori
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/22/14 07:35, Michael Roth wrote:
> Quoting Stefan Hajnoczi (2014-04-22 08:31:08)
>> On Wed, Apr 02, 2014 at 05:40:23PM -0700, Alex Davis wrote:
>>> and where is their gpg key?
>> 
>> Michael Roth  is doing releases:
>> 
>> http://pgp.mit.edu/pks/lookup?op=vindex&search=0x3353C9CEF108B584
>>
>>
>> 
$ gpg --verify qemu-2.0.0.tar.bz2.sig
>> gpg: Signature made Thu 17 Apr 2014 03:49:55 PM CEST using RSA
>> key ID F108B584 gpg: Good signature from "Michael Roth
>> " gpg: aka "Michael Roth
>> " gpg: aka "Michael Roth
>> "
> 
> Missed the context, but if this is specifically about 1.7.1:
> 
> 1.7.1 was prior to me handling the release tarballs, Anthony
> actually did the signing and uploading for that one. I'm a bit
> confused though, as the key ID on that tarball is:
> 
> mdroth@loki:~/Downloads$ gpg --verify qemu-1.7.1.tar.bz2.sig gpg:
> Signature made Tue 25 Mar 2014 09:03:24 AM CDT using RSA key ID
> ADF0D2D9 gpg: Can't check signature: public key not found
> 
> I can't seem to locate ADF0D2D9 though:
> 
> http://pgp.mit.edu/pks/lookup?search=0xADF0D2D9&op=vindex
> 
> Anthony's normal key (for 1.6.0 and 1.7.0 at least) was 7C18C076:
> 
> http://pgp.mit.edu/pks/lookup?search=0x7C18C076&op=vindex
> 
> I think maybe Anthony might've signed it with a separate local
> key?

Yeah, I accidentally signed it with the wrong key.  Replacing the
signature doesn't seem like the right thing to do since release
artifacts should never change.

Regards,

Anthony Liguori

>> 
>> Stefan
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTV8NqAAoJEBqtxxBWguX/j9oH/3eVb+PgcXhEHICRXNoPyNy8
wiMeNABsTh7xn/wYpUHBxIa0lWWeO/W/6ZFLhfL50C8Nm8fsldEASOB6jngcK1dZ
5jAexApGeN5Q10Bi+reum7/bqCgxaHRmXEO/wyJtlOiC/fxsbdupg04Zk6dO2b5h
gRHxkt8uC2DWRJjb8fReR1K96aTPm9SI9GRrNZ9pAHrT6MeF3FOQGkY0hhpPDE6k
YPXb8keAlldT0U9h/Du+8m7mMCKMvwa3rRMNSw+lw7Oc5eMRwQzxUB+B4jEJ9f1k
+bL7opOcYNgqBxhKzAFgmMqlnwvM55CsWiPRq5L0/68w8qxWRQl+ECPfpJ1O0ac=
=/bg9
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-04-01

2014-03-31 Thread Anthony Liguori
On Mon, Mar 31, 2014 at 7:46 AM, Andreas Färber  wrote:
> Am 31.03.2014 16:32, schrieb Peter Maydell:
>> On 31 March 2014 15:28, Paolo Bonzini  wrote:
>>> I think it would be a good idea to separate the committer and release
>>> manager roles.  Peter is providing the community with a wonderful service,
>>> just like you were; putting too much work on his shoulders risks getting us
>>> in the same situation if anything were to affect his ability to provide it.
>>
>> Yes, I strongly agree with this. I think we'll do much better
>> if we can manage to share out responsibilities among a wider
>> group of people.
>
> May I propose Michael Roth, who is already experienced from the N-1
> stable releases?
>
> If we can enable him to upload the tarballs created from his tags that
> would also streamline the stable workflow while at it.

If mdroth is willing to take this on, I am very supportive.

Regards,

Anthony Liguori

>
> Regards,
> Andreas
>
> --
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-04-01

2014-03-31 Thread Anthony Liguori
On Mon, Mar 31, 2014 at 6:25 AM, Peter Maydell  wrote:
> On 31 March 2014 14:21, Christian Borntraeger  wrote:
>> Another thing might be the release process in general. Currently it seems
>> that everybody tries to push everything just before the hard freeze.  I had
>> to debug some problems introduced _after_ soft freeze. Is there some
>> interest in having a Linux-like process (merge window + stabilization)? This
>> would require shorter release cycles of course.
>
> "merge window" has been suggested before. I think it would be
> a terrible idea for QEMU, personally. We're not the kernel in
> many ways, notably dev community size and a greater tendency
> to changes that have effects across the whole tree.
>
> Soft + hard freeze is our stabilization period currently.

Peter, are you willing to do the tagging and announcement for the 2.0
rcs?  I sent instructions privately and between stefanha and I we can
get your permissions sorted out.

Regards,

Anthony Liguori

> thanks
> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2013-12-10

2013-12-10 Thread Anthony Liguori
On Tue, Dec 10, 2013 at 4:54 AM, Markus Armbruster  wrote
> Paolo Bonzini  writes:
>
>> Il 10/12/2013 12:42, Juan Quintela ha scritto:
>>>
>>> Hi
>>>
>>> Please, send any topic that you are interested in covering.
>>
>> May not need a phone call, but I'll drop it here: what happened to
>> acknowledgement emails from the patches script?
>>
>> Also, Anthony, it looks like you're still adjusting to the new job.  If
>> you need help with anything, I guess today's call could be a good place
>> to discuss it.
>>
>> And someone needs to send out the email saying that 1.7.0 is out and
>> that the next version will be 2.0!
>
> Speaking of sending out e-mail: did I miss the promised followup to the
> key signing party?

I need to find the papers from KVM Forum which are somewhere among the
stacks of boxes here :-/

Regards,

Anthony Liguori

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-12-10

2013-12-10 Thread Anthony Liguori
On Tue, Dec 10, 2013 at 4:37 AM, Paolo Bonzini  wrote:
> Il 10/12/2013 12:42, Juan Quintela ha scritto:
>>
>> Hi
>>
>> Please, send any topic that you are interested in covering.
>
> May not need a phone call, but I'll drop it here:

Could we move the time on this phone call?  7am conflicts with my
daily commute.  I could do 6am or 9am.  I think it would be very
useful to be able to attend this call.

> what happened to
> acknowledgement emails from the patches script?

It's buggy and I haven't had a chance to rewrite it yet.

> Also, Anthony, it looks like you're still adjusting to the new job.  If
> you need help with anything, I guess today's call could be a good place
> to discuss it.
>
> And someone needs to send out the email saying that 1.7.0 is out and
> that the next version will be 2.0!

Mail is out now, sorry for the delay.

Pull requests should be getting processed in a reasonable time.  I am
not yet spending enough time doing patch review but that should
improve in the very near future.

It's not so much the new job as it is relocating and moving all at the
same time.  I'm hoping the holiday break is a good way to catch up
things.  Of course, we should revisit again soon.

Regards,

Anthony Liguori

>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Elvis upstreaming plan

2013-11-27 Thread Anthony Liguori
Abel Gordon  writes:

> "Michael S. Tsirkin"  wrote on 27/11/2013 12:27:19 PM:
>
>>
>> On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
>> > Hi,
>> >
>> > Razya is out for a few days, so I will try to answer the questions as
> well
>> > as I can:
>> >
>> > "Michael S. Tsirkin"  wrote on 26/11/2013 11:11:57 PM:
>> >
>> > > From: "Michael S. Tsirkin" 
>> > > To: Abel Gordon/Haifa/IBM@IBMIL,
>> > > Cc: Anthony Liguori , abel.gor...@gmail.com,
>> > > as...@redhat.com, digitale...@google.com, Eran Raichstein/Haifa/
>> > > IBM@IBMIL, g...@redhat.com, jasow...@redhat.com, Joel Nider/Haifa/
>> > > IBM@IBMIL, kvm@vger.kernel.org, pbonz...@redhat.com, Razya Ladelsky/
>> > > Haifa/IBM@IBMIL
>> > > Date: 27/11/2013 01:08 AM
>> > > Subject: Re: Elvis upstreaming plan
>> > >
>> > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
>> > > >
>> > > >
>> > > > Anthony Liguori  wrote on 26/11/2013
> 08:05:00
>> > PM:
>> > > >
>> > > > >
>> > > > > Razya Ladelsky  writes:
>> > > > >
>> > 
>> > > >
>> > > > That's why we are proposing to implement a mechanism that will
> enable
>> > > > the management stack to configure 1 thread per I/O device (as it is
>> > today)
>> > > > or 1 thread for many I/O devices (belonging to the same VM).
>> > > >
>> > > > > Once you are scheduling multiple guests in a single vhost device,
> you
>> > > > > now create a whole new class of DoS attacks in the best case
>> > scenario.
>> > > >
>> > > > Again, we are NOT proposing to schedule multiple guests in a single
>> > > > vhost thread. We are proposing to schedule multiple devices
> belonging
>> > > > to the same guest in a single (or multiple) vhost thread/s.
>> > > >
>> > >
>> > > I guess a question then becomes why have multiple devices?
>> >
>> > If you mean "why serve multiple devices from a single thread" the
> answer is
>> > that we cannot rely on the Linux scheduler which has no knowledge of
> I/O
>> > queues to do a decent job of scheduling I/O.  The idea is to take over
> the
>> > I/O scheduling responsibilities from the kernel's thread scheduler with
> a
>> > more efficient I/O scheduler inside each vhost thread.  So by combining
> all
>> > of the I/O devices from the same guest (disks, network cards, etc) in a
>> > single I/O thread, it allows us to provide better scheduling by giving
> us
>> > more knowledge of the nature of the work.  So now instead of relying on
> the
>> > linux scheduler to perform context switches between multiple vhost
> threads,
>> > we have a single thread context in which we can do the I/O scheduling
> more
>> > efficiently.  We can closely monitor the performance needs of each
> queue of
>> > each device inside the vhost thread which gives us much more
> information
>> > than relying on the kernel's thread scheduler.
>> > This does not expose any additional opportunities for attacks (DoS or
>> > other) than are already available since all of the I/O traffic belongs
> to a
>> > single guest.
>> > You can make the argument that with low I/O loads this mechanism may
> not
>> > make much difference.  However when you try to maximize the utilization
> of
>> > your hardware (such as in a commercial scenario) this technique can
> gain
>> > you a large benefit.
>> >
>> > Regards,
>> >
>> > Joel Nider
>> > Virtualization Research
>> > IBM Research and Development
>> > Haifa Research Lab
>>
>> So all this would sound more convincing if we had sharing between VMs.
>> When it's only a single VM it's somehow less convincing, isn't it?
>> Of course if we would bypass a scheduler like this it becomes harder to
>> enforce cgroup limits.
>
> True, but here the issue becomes isolation/cgroups. We can start to show
> the value for VMs that have multiple devices / queues and then we could
> re-consider extending the mechanism for multiple VMs (at least as a
> experimental feature).
>
>> But it might be easier to give scheduler the info it needs to do what we
>> need.  Would an API that basically says "r

Re: Elvis upstreaming plan

2013-11-26 Thread Anthony Liguori
Razya Ladelsky  writes:

> Hi all,
>
> I am Razya Ladelsky, I work at IBM Haifa virtualization team, which 
> developed Elvis, presented by Abel Gordon at the last KVM forum: 
> ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs 
> ELVIS slides: https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE 
>
>
> According to the discussions that took place at the forum, upstreaming 
> some of the Elvis approaches seems to be a good idea, which we would like 
> to pursue.
>
> Our plan for the first patches is the following: 
>
> 1.Shared vhost thread between mutiple devices 
> This patch creates a worker thread and worker queue shared across multiple 
> virtio devices 
> We would like to modify the patch posted in
> https://github.com/abelg/virtual_io_acceleration/commit/3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
>  
> to limit a vhost thread to serve multiple devices only if they belong to 
> the same VM as Paolo suggested to avoid isolation or cgroups concerns.
>
> Another modification is related to the creation and removal of vhost 
> threads, which will be discussed next.

I think this is an exceptionally bad idea.

We shouldn't throw away isolation without exhausting every other
possibility.

We've seen very positive results from adding threads.  We should also
look at scheduling.

Once you are scheduling multiple guests in a single vhost device, you
now create a whole new class of DoS attacks in the best case scenario.

> 2. Sysfs mechanism to add and remove vhost threads 
> This patch allows us to add and remove vhost threads dynamically.
>
> A simpler way to control the creation of vhost threads is statically 
> determining the maximum number of virtio devices per worker via a kernel 
> module parameter (which is the way the previously mentioned patch is 
> currently implemented)
>
> I'd like to ask for advice here about the more preferable way to go:
> Although having the sysfs mechanism provides more flexibility, it may be a 
> good idea to start with a simple static parameter, and have the first 
> patches as simple as possible. What do you think?
>
> 3.Add virtqueue polling mode to vhost 
> Have the vhost thread poll the virtqueues with high I/O rate for new 
> buffers , and avoid asking the guest to kick us.
> https://github.com/abelg/virtual_io_acceleration/commit/26616133fafb7855cc80fac070b0572fd1aaf5d0

Ack on this.

Regards,

Anthony Liguori

> 4. vhost statistics
> This patch introduces a set of statistics to monitor different performance 
> metrics of vhost and our polling and I/O scheduling mechanisms. The 
> statistics are exposed using debugfs and can be easily displayed with a 
> Python script (vhost_stat, based on the old kvm_stats)
> https://github.com/abelg/virtual_io_acceleration/commit/ac14206ea56939ecc3608dc5f978b86fa322e7b0
>
>
> 5. Add heuristics to improve I/O scheduling 
> This patch enhances the round-robin mechanism with a set of heuristics to 
> decide when to leave a virtqueue and proceed to the next.
> https://github.com/abelg/virtual_io_acceleration/commit/f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
>
> This patch improves the handling of the requests by the vhost thread, but 
> could perhaps be delayed to a 
> later time , and not submitted as one of the first Elvis patches.
> I'd love to hear some comments about whether this patch needs to be part 
> of the first submission.
>
> Any other feedback on this plan will be appreciated,
> Thank you,
> Razya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()

2013-11-12 Thread Anthony Liguori
On Tue, Nov 12, 2013 at 8:08 AM, Peter Maydell  wrote:
> On 12 November 2013 15:58, Paolo Bonzini  wrote:
>> I don't really see a reason why QEMU should give clang more weight than
>> Windows or Mac OS X.
>
> I'm not asking for more weight (and actually my main
> reason for caring about clang is exactly MacOSX). I'm
> just asking that when a bug is reported whose underlying
> cause is "we don't work on clang because we're relying on
> undocumented behaviour of gcc" with an attached patch that
> fixes this by not relying on the undocumented behaviour,
> that we apply the patch rather than saying "why do we
> care about clang"...

QEMU has always been intimately tied to GCC.  Heck, it all started as
a giant GCC hack relying on entirely undocumented behavior (dyngen's
disassembly of functions).

There's nothing intrinsically bad about being tied to GCC.  If you
were making argument that we could do it a different way and the
result would be as nice or nicer, then it wouldn't be a discussion.

But if supporting clang means we have to remove useful things, then
it's always going to be an uphill battle.

In this case, the whole discussion is a bit silly.  Have you actually
tried -O1 under a debugger with clang?  Is it noticably worse than
-O0?

I find QEMU extremely difficult to use an interactive debugger on
anyway.  I doubt the difference between -O0 and -O1 is even close to
the breaking point between usability under a debugger...

Regards,

Anthony Liguori

> This seems to me to be a win-win situation:
>  * we improve our code by not relying on undocumented
>implentation specifics
>  * we work on a platform that, while not a primary
>platform, is at least supported in the codebase and
>has people who fix it when it breaks
>
> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-1.7] target-i386: Fix build by providing stub kvm_arch_get_supported_cpuid()

2013-11-11 Thread Anthony Liguori
On Mon, Nov 11, 2013 at 3:11 PM, Paolo Bonzini  wrote:
> Il 11/11/2013 23:38, Peter Maydell ha scritto:
>> If we have other places where we're relying on dead code elimination
>> to not provide a function definition, please point them out, because
>> they're bugs we need to fix, ideally before they cause compilation
>> failures.
>
> I'm not sure, there are probably a few others.  Linux also relies on the
> idiom (at least KVM does on x86).

And they are there because it's a useful tool.

>> Huh? The point of stub functions is to provide versions of functions
>> which either need to return an "always fails" code, or which will never
>> be called, but in either case this is so we can avoid peppering the
>> code with #ifdefs. The latter category is why we have stubs which
>> do nothing but call abort().
>
> There are very few stubs that call abort():
>
> int kvm_cpu_exec(CPUState *cpu)
> {
> abort();
> }
>
> int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset)
> {
> abort();
> }
>
> Calling abort() would be marginally better than returning 0, but why
> defer checks to runtime when you can let the linker do them?

Exactly.

>>>> I wouldn't be surprised if this also affected debug gcc
>>>> builds with KVM disabled, but I haven't checked.
>>>
>>> No, it doesn't affect GCC.  See Andreas's bug report.  Is it a bug or a
>>> feature?  Having some kind of -O0 dead-code elimination is definitely a
>>> feature (http://gcc.gnu.org/ml/gcc-patches/2003-03/msg02443.html).
>>
>> That patch says it is to "speed up these RTL optimizers and by allocating
>> less memory, reduce the compiler footprint and possible memory
>> fragmentation". So they might investigate it as a performance
>> regression, but it's only a "make compilation faster" feature, not
>> correctness. Code which relies on dead-code-elimination is broken.
>
> There's plenty of tests in the GCC testsuite that rely on DCE to test
> that an optimization happened; some of them at -O0 too.  So it's become
> a GCC feature in the end.
>
> Code which relies on dead-code-elimination is not broken, it's relying
> on the full power of the toolchain to ensure bugs are detected as soon
> as possible, i.e. at build time.
>
>>> I am okay with Andreas's patch of course, but it would also be fine with
>>> me to split the "if" in two, each with its own separate break statement.
>>
>> I think Andreas's patch is a bad idea and am against it being
>> applied. It's very obviously a random tweak aimed at a specific
>> compiler's implementation of dead-code elimination, and it's the
>> wrong way to fix the problem.
>
> It's very obviously a random tweak aimed at a specific compiler's bug in
> dead-code elimination, I'm not denying that.  But the same compiler
> feature is being exploited elsewhere.

We're not talking about something obscure here.  It's eliminating an
if(0) block.  There's no reason to leave an if (0) block around.  The
code is never reachable.

>>> Since it only affects debug builds, there is no hurry to fix this in 1.7
>>> if the approach cannot be agreed with.
>>
>> ??  Debug builds should absolutely work out of the box -- if
>> debug build fails that is IMHO a release critical bug.
>
> Debug builds for qemu-system-{i386,x86_64} with clang on systems other
> than x86/Linux.

Honestly, it's hard to treat clang as a first class target.  We don't
have much infrastructure around so it's not getting that much testing.

We really need to figure out how we're going to do CI.

FWIW, I'd rather just add -O1 for debug builds than add more stub functions.

Regards,

Anthony Liguori

>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/6] VFIO updates for QEMU

2013-10-09 Thread Anthony Liguori
Alex Williamson  writes:

> The following changes since commit a684f3cf9b9b9c3cb82be87aafc463de8974610c:
>
>   Merge remote-tracking branch 'kraxel/seabios-1.7.3.2' into staging 
> (2013-09-30 17:15:27 -0500)
>
> are available in the git repository at:
>
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-20131003.0
>
> for you to fetch changes up to 1d5bf692e55ae22b59083741d521e27db704846d:
>
>   vfio: Fix debug output for int128 values (2013-10-03 09:10:09 -0600)
>
> 

Judging from the review comments, I think this needs a v2.

Regards,

Anthony Liguori

> vfio-pci updates include:
>  - Forgotten MSI affinity patch posted several months ago
>  - Lazy option ROM loading to delay load until after device/bus resets
>  - Error reporting cleanups
>  - PCI hot reset support introduced with Linux v3.12 development kernels
>  - Debug build fix for int128
>
> The lazy ROM loading and hot reset should help VGA assignment as we can
> now do a bus reset when there are multiple devices on the bus, ex.
> multi-function graphics and audio cards.  The known remaining part for
> VGA is the KVM-VFIO device and matching QEMU support to properly handle
> devices that make use of No-Snoop transactions, particularly on Intel
> host systems.
>
> 
> Alex Williamson (5):
>   vfio-pci: Add support for MSI affinity
>   vfio-pci: Test device reset capabilities
>   vfio-pci: Lazy PCI option ROM loading
>   vfio-pci: Cleanup error_reports
>   vfio-pci: Implement PCI hot reset
>
> Alexey Kardashevskiy (1):
>   vfio: Fix debug output for int128 values
>
>  hw/misc/vfio.c | 621 
> +++--
>  1 file changed, 512 insertions(+), 109 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is fallback vhost_net to qemu for live migrate available?

2013-08-29 Thread Anthony Liguori
Hi Qin,

On Mon, Aug 26, 2013 at 10:32 PM, Qin Chuanyu  wrote:
> Hi all
>
> I am participating in a project which try to port vhost_net on Xen。

Neat!

> By change the memory copy and notify mechanism ,currently virtio-net with
> vhost_net could run on Xen with good performance。

I think the key in doing this would be to implement a property
ioeventfd and irqfd interface in the driver domain kernel.  Just
hacking vhost_net with Xen specific knowledge would be pretty nasty
IMHO.

Did you modify the front end driver to do grant table mapping or is
this all being done by mapping the domain's memory?

> TCP receive throughput of
> single vnic from 2.77Gbps up to 6Gps。In VM receive side,I instead grant_copy
> with grant_map + memcopy,it efficiently reduce the cost of grant_table
> spin_lock of dom0,So the hole server TCP performance from 5.33Gps up to
> 9.5Gps。
>
> Now I am consider the live migrate of vhost_net on Xen,vhost_net use
> vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the hole
> memory of VM,So I am trying to fallback datapath from vhost_net to qemu when
> doing live migrate ,and fallback datapath from qemu to
> vhost_net again after vm migrate to new server。

KVM and Xen represent memory in a very different way.  KVM can only
track when guest mode code dirties memory.  It relies on QEMU to track
when guest memory is dirtied by QEMU.  Since vhost is running outside
of QEMU, vhost also needs to tell QEMU when it has dirtied memory.

I don't think this is a problem with Xen though.  I believe (although
could be wrong) that Xen is able to track when either the domain or
dom0 dirties memory.

So I think you can simply ignore the dirty logging with vhost and it
should Just Work.

>
> My question is:
> why didn't vhost_net do the same fallback operation for live migrate
> on KVM,but use vhost_log to mark the dirty page?
> Is there any mechanism fault for the idea of fallback datapath from
> vhost_net to qemu for live migrate?

No, we don't have a mechanism to fallback  to QEMU for the datapath.
It would be possible but I think it's a bad idea to mix and match the
two.

Regards,

Anthony Liguori

> any question about the detail of vhost_net on Xen is welcome。
>
> Thanks
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Are there plans to achieve ram live Snapshot feature?

2013-08-09 Thread Anthony Liguori
Chijianchun  writes:

> Now in KVM, when RAM snapshot, vcpus needs stopped, it is Unfriendly 
> restrictions to users.
>
> Are there plans to achieve ram live Snapshot feature?

I think you mean a live version of the savevm command.

You can approximate live migrating to a file, creating an external disk
snapshot, then resuming the guest.

Regards,

Anthony Liguori

>
> in my mind, Snapshots can not occupy additional too much memory, So when the 
> memory needs to be changed, the old memory page is needed to flush to the 
> file first.  But flushing to file is too slower than memory,  and when 
> flushing, the vcpu or VM is need to be paused until finished flushing,  so 
> pause...resume...pause...resume., more and more slower.
>
> Is this idea feasible? Are there any other thoughts?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] Key Signing Party at KVM Forum 2013

2013-07-24 Thread Anthony Liguori

I will be hosting a key signing party at this year's KVM Forum.

http://wiki.qemu.org/KeySigningParty2013

Starting for the 1.7 release (begins in December), I will only accepted
signed pull requests so please try to attend this event or make
alternative arrangements to have someone sign your key who will attend
the event.

I will also be attending LinuxCon/CloudOpen/Plumbers North America if
anyone wants to have another key signing party at that event and cannot
attend KVM Forum.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VU#976534 - How to submit security bugs?

2013-07-24 Thread Anthony Liguori
"CERT(R) Coordination Center"  writes:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Greetings,
>   My name is Adam Rauf and I work for the CERT Coordination Center.  We
> have a report that may affect KVM/QEMU.  How can we securely send it over to
> you?  Thanks so much!

For QEMU bugs, please file a bug in Launchpad and mark it as a security
bug.  That will appropriately limit visibility.

http://launchpad.net/qemu

If you want to contact me directly, my public key is:

http://www.codemonkey.ws/files/aliguori.pub

You can verify that this key is what is used to sign QEMU releases at:

http://wiki.qemu.org/Download

Regards,

Anthony Liguori

>
> Adam Rauf
> Software Engineering Institute
> CERT Vulnerability Analysis Team
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.5 (GNU/Linux)
>
> iQEVAwUBUe2DstXCAanP4MNyAQI8nwf/eTb1Qox5lmgMHifDKRjj69E37FW+o5Jp
> KMIP6+IgKdWQizPctXk2Gae50a+ioaXgkCGZZ7SwNJ9iE/AX2I32QvX6pZrDCBGw
> l5Ht6UiwOLUTP3sKWO9AIYcgTDABzyNE2+bCGvDz8aqwLB8NNVqQ50f46TrQNlmB
> oiG+XzskRG0BAxKTwWc8f4v+1hdqMtp811I7XmxXkAdtlmTWPHZfPiFs0dS++Puh
> T0uLuC4nDo83hP6Yv8seMZKZZApFGfR+q4qKx7f6riNsa5v1zGgW2if++u+zRKvg
> DvjLxjtRfE9JGmCZMBcFmRJ5y4Wx/m/2wtj2+a7D/D2Hd9L5LRB0lA==
> =npI1
> -END PGP SIGNATURE-
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM Forum 2013 Call for Participation - Extended to August 4th

2013-07-23 Thread Anthony Liguori

We have received numerous requests to extend the CFP deadline and so
we are happy to announce that the CFP deadline has been moved by two
weeks to August 4th.

=
KVM Forum 2013: Call For Participation
October 21-23, 2013 - Edinburgh International Conference Centre - Edinburgh, UK

(All submissions must be received before midnight July 21, 2013)
=

KVM is an industry leading open source hypervisor that provides an ideal
platform for datacenter virtualization, virtual desktop infrastructure,
and cloud computing.  Once again, it's time to bring together the
community of developers and users that define the KVM ecosystem for
our annual technical conference.  We will discuss the current state of
affairs and plan for the future of KVM, its surrounding infrastructure,
and management tools.  The oVirt Workshop will run in parallel with the
KVM Forum again, bringing in a community focused on enterprise datacenter
virtualization management built on KVM.  For topics which overlap we will
have shared sessions.  So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon Europe.
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2013.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM/Kernel
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - VFIO, device assignment, SR-IOV
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - Network virtualization
 - Security
 - Architecture ports

 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support

 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized

 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management

 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: July 21, 2013
Notification: August 1, 2013

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Aug 1st.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

HOTEL / TRAVEL

The KVM Forum 2013 will be held in Edinburgh, UK at the Edinburgh
International Conference Centre.

http://events.linuxfoundation.org/events/kvm-forum/hotel

Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2013 in October!

Thanks,
-your KVM Forum 2013 Program Committee

Please contact us with any questions or comments.
kvm-forum-2013...@redhat.com

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-06-11

2013-06-11 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote:
>> Juan is not available now, and Anthony asked for
>> agenda to be sent early.
>> So here comes:
>> 
>> Agenda for the meeting Tue, June 11:
>>  
>> - Generating acpi tables, redux
>
> Not so much notes as a quick summary of the call:
>
> There are the following reasons to generate ACPI tables in QEMU:
>
> - sharing code with e.g. ovmf
>   Anthony thinks this is not a valid argument
>
> - so we can make tables more dynamic and move away from iasl
>   Anthony thinks this is not a valid reason too,
>   since qemu and seabios have access to same info
>   MST noted several info not accessible to bios.
>   Anthony said they can be added, e.g. by exposing
>   QOM to the bios.
>
> - even though most tables are static, hardcoded
>   they are likely to change over time
>   Anthony sees this as justified
>
> To summarize, there's a concensus now that generating ACPI
> tables in QEMU is a good idea.

I would say best worst idea ;-)

I am deeply concerned about the complexity it introduces but I don't see
many other options.

>
> Two issues that need to be addressed:
> - original patches break cross-version migration. Need to fix that.
>
> - Anthony requested that patchset is merged together with
>   some new feature. I'm not sure the reasoning is clear:
>   current a version intentionally generates tables
>   that are bug for bug compatible with seabios,
>   to simplify testing.

I expect that there will be additional issues that need to be worked out
and want to see a feature that actually uses the infrastructure before
we add it.

>   It seems clear we have users for this such as
>   hotplug of devices behind pci bridges, so
>   why keep the infrastructure out of tree?

It's hard to evaluate the infrastructure without a user.

>   Looking for something additional, smaller as the hotplug patch
>   is a bit big, so might delay merging.
>
>
> Going forward - would we want to move
> smbios as well? Everyone seems to think it's a
> good idea.

Yes, independent of ACPI, I think QEMU should be generating the SMBIOS
tables.

Regards,

Anthony Liguori

> -- 
> MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori
Gleb Natapov  writes:

> On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote:
>> "H. Peter Anvin"  writes:
>> 
>> > On 06/05/2013 03:08 PM, Anthony Liguori wrote:
>> >>>
>> >>> Definitely an option.  However, we want to be able to boot from native
>> >>> devices, too, so having an I/O BAR (which would not be used by the OS
>> >>> driver) should still at the very least be an option.
>> >> 
>> >> What makes it so difficult to work with an MMIO bar for PCI-e?
>> >> 
>> >> With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
>> >> forward.  Is there something special about PCI-e here?
>> >> 
>> >
>> > It's not tracking allocation.  It is that accessing memory above 1 MiB
>> > is incredibly painful in the BIOS environment, which basically means
>> > MMIO is inaccessible.
>> 
>> Oh, you mean in real mode.
>> 
>> SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
>> There are loads of ASSERT32FLAT()s in the code to make sure of this.
>> 
> Well, not exactly. Initialization is done in 32bit, but disk
> reads/writes are done in 16bit mode since it should work from int13
> interrupt handler. The only way I know to access MMIO bars from 16 bit
> is to use SMM which we do not have in KVM.

Ah, if it's just the dataplane operations then there's another solution.

We can introduce a virtqueue flag that asks the backend to poll for new
requests.  Then SeaBIOS can add the request to the queue and not worry
about kicking or reading the ISR.

SeaBIOS is polling for completion anyway.

Regards,

Anthony Liguori

>
> --
>   Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-06 Thread Anthony Liguori

Hi Rusty,

Rusty Russell  writes:

> Anthony Liguori  writes:
>> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
>>it a new device/vendor ID.   Continue to use virtio-pci for existing
>>devices potentially adding virtio-{net,blk,...}-pcie variants for
>>people that care to use them.
>
> Now you have a different compatibility problem; how do you know the
> guest supports the new virtio-pcie net?

We don't care.

We would still use virtio-pci for existing devices.  Only new devices
would use virtio-pcie.

> If you put a virtio-pci card behind a PCI-e bridge today, it's not
> compliant, but AFAICT it will Just Work.  (Modulo the 16-dev limit).

I believe you can put it in legacy mode and then there isn't the 16-dev
limit.  I believe the only advantage of putting it in native mode is
that then you can do native hotplug (as opposed to ACPI hotplug).

So sticking with virtio-pci seems reasonable to me.

> I've been assuming we'd avoid a "flag day" change; that devices would
> look like existing virtio-pci with capabilities indicating the new
> config layout.

I don't think that's feasible.  Maybe 5 or 10 years from now, we switch
the default adapter to virtio-pcie.

>> I think 4 is the best path forward.  It's better for users (guests
>> continue to work as they always have).  There's less confusion about
>> enabling PCI-e support--you must ask for the virtio-pcie variant and you
>> must have a virtio-pcie driver.  It's easy to explain.
>
> Removing both forward and backward compatibility is easy to explain, but
> I think it'll be harder to deploy.  This is your area though, so perhaps
> I'm wrong.

My concern is that it's not real backwards compatibility.

>> It also maps to what regular hardware does.  I highly doubt that there
>> are any real PCI cards that made the shift from PCI to PCI-e without
>> bumping at least a revision ID.
>
> Noone expected the new cards to Just Work with old OSes: a new machine
> meant a new OS and new drivers.  Hardware vendors like that.

Yup.

> Since virtualization often involves legacy, our priorities might be
> different.

So realistically, I think if we introduce virtio-pcie with a different
vendor ID, it will be adopted fairly quickly.  The drivers will show up
in distros quickly and get backported.

New devices can be limited to supporting virtio-pcie and we'll certainly
provide a way to use old devices with virtio-pcie too.  But for
practical reasons, I think we have to continue using virtio-pci by
default.

Regards,

Anthony Liguori

>
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"H. Peter Anvin"  writes:

> On 06/05/2013 03:08 PM, Anthony Liguori wrote:
>>>
>>> Definitely an option.  However, we want to be able to boot from native
>>> devices, too, so having an I/O BAR (which would not be used by the OS
>>> driver) should still at the very least be an option.
>> 
>> What makes it so difficult to work with an MMIO bar for PCI-e?
>> 
>> With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
>> forward.  Is there something special about PCI-e here?
>> 
>
> It's not tracking allocation.  It is that accessing memory above 1 MiB
> is incredibly painful in the BIOS environment, which basically means
> MMIO is inaccessible.

Oh, you mean in real mode.

SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout.
There are loads of ASSERT32FLAT()s in the code to make sure of this.

Regards,

Anthony Liguori

>
>   -hpa
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
Benjamin Herrenschmidt  writes:

> On Wed, 2013-06-05 at 16:53 -0500, Anthony Liguori wrote:
>
>> A smart BIOS can also use MMIO to program virtio.
>
> Indeed :-)
>
> I see no reason why not providing both access path though. Have the PIO
> BAR there for compatibility/legacy/BIOS/x86 purposes and *also* have the
> MMIO window which I'd be happy to favor on power.
>
> We could even put somewhere in there a feature bit set by qemu to
> indicate whether it thinks PIO or MMIO is faster on a given platform if
> you really think that's worth it (I don't).

That's okay, but what I'm most concerned about is compatibility.

A virtio PCI device that's a "native endpoint" needs to have a different
device ID than one that is a "legacy endpoint".  The current drivers
have no hope of working (well) with virtio PCI devices exposed as native
endpoints.

I don't care if the native PCI endpoint also has a PIO bar.  But it
seems silly (and confusing) to me to make that layout be the "legacy"
layout verses a straight mirror of the new layout if we're already
changing the device ID.

In addition, it doesn't seem at all necessary to have an MMIO bar to the
legacy device.  If the reason you want MMIO is to avoid using PIO, then
you break existing drivers because they assume PIO.  If you are breaking
existing drivers then you should change the device ID.

If strictly speaking it's just that MMIO is a bit faster, I'm not sure
that complexity is worth it without seeing performance numbers first.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"H. Peter Anvin"  writes:

> On 06/05/2013 02:50 PM, Anthony Liguori wrote:
>> "H. Peter Anvin"  writes:
>> 
>>> On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
>>>>
>>>> Spec says IO and memory can be enabled/disabled, separately.
>>>> PCI Express spec says devices should work without IO.
>>>>
>>>
>>> For "native endpoints".  Currently virtio would be a "legacy endpoint"
>>> which is quite correct -- it is compatible with a legacy interface.
>> 
>> Do legacy endpoints also use 4k for BARs?
>
> There are no 4K BARs.  In fact, I/O BARs are restricted by spec (there
> is no technical enforcement, however) to 256 bytes.
>
> The 4K come from the upstream bridge windows, which are only 4K granular
> (historic stuff from when bridges were assumed rare.)  However, there
> can be multiple devices, functions, and BARs inside that window.

Got it.

>
> The issue with PCIe is that each PCIe port is a bridge, so in reality
> there is only one real device per bus number.
>
>> If not, can't we use a new device id for native endpoints and call it a
>> day?  Legacy endpoints would continue using the existing BAR layout.
>
> Definitely an option.  However, we want to be able to boot from native
> devices, too, so having an I/O BAR (which would not be used by the OS
> driver) should still at the very least be an option.

What makes it so difficult to work with an MMIO bar for PCI-e?

With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight
forward.  Is there something special about PCI-e here?

Regards,

Anthony Liguori

>
>   -hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 03:42:57PM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> Can you explain?  I thought the whole trick with separating out the
>> virtqueue notification register was to regain the performance?
>
> Yes but this trick only works well with NPT (it's still a bit
> slower than PIO but not so drastically).
> Without NPT you still need a page walk so it will be slow.

Do you mean NPT/EPT?

If your concern is shadow paging, then I think you're concerned about
hardware that is so slow to start with that it's not worth considering.

>> >> It also maps to what regular hardware does.  I highly doubt that there
>> >> are any real PCI cards that made the shift from PCI to PCI-e without
>> >> bumping at least a revision ID.
>> >
>> > Only because the chance it's 100% compatible on the software level is 0.
>> > It always has some hardware specific quirks.
>> > No such excuse here.
>> >
>> >> It also means we don't need to play games about sometimes enabling IO
>> >> bars and sometimes not.
>> >
>> > This last paragraph is wrong, it ignores the issues 3) to 5) 
>> > I added above.
>> >
>> > If you do take them into account:
>> >- there are reasons to add MMIO BAR to PCI,
>> >  even without PCI express
>> 
>> So far, the only reason you've provided is "it doesn't work on some
>> architectures."  Which architectures?
>
> PowerPC wants this.

Existing PowerPC remaps PIO to MMAP so it works fine today.

Future platforms may not do this but future platforms can use a
different device.  They certainly won't be able to use the existing
drivers anyway.

Ben, am I wrong here?

>> >- we won't be able to drop IO BAR from virtio
>> 
>> An IO BAR is useless if it means we can't have more than 12 devices.
>
>
> It's not useless. A smart BIOS can enable devices one by one as
> it tries to boot from them.

A smart BIOS can also use MMIO to program virtio.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"H. Peter Anvin"  writes:

> On 06/05/2013 09:20 AM, Michael S. Tsirkin wrote:
>> 
>> Spec says IO and memory can be enabled/disabled, separately.
>> PCI Express spec says devices should work without IO.
>> 
>
> For "native endpoints".  Currently virtio would be a "legacy endpoint"
> which is quite correct -- it is compatible with a legacy interface.

Do legacy endpoints also use 4k for BARs?

If not, can't we use a new device id for native endpoints and call it a
day?  Legacy endpoints would continue using the existing BAR layout.

Regards,

Anthony Liguori

>
>   -hpa
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 10:43:17PM +0300, Michael S. Tsirkin wrote:
>> On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
>> > "Michael S. Tsirkin"  writes:
>> > 
>> > > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> > >> Look, it's very simple.
>> > > We only need to do it if we do a change that breaks guests.
>> > >
>> > > Please find a guest that is broken by the patches. You won't find any.
>> > 
>> > I think the problem in this whole discussion is that we're talking past
>> > each other.
>> > 
>> > Here is my understanding:
>> > 
>> > 1) PCI-e says that you must be able to disable IO bars and still have a
>> > functioning device.
>> > 
>> > 2) It says (1) because you must size IO bars to 4096 which means that
>> > practically speaking, once you enable a dozen or so PIO bars, you run
>> > out of PIO space (16 * 4k == 64k and not all that space can be used).
>> 
>> 
>> Let me add 3 other issues which I mentioned and you seem to miss:
>> 
>> 3) architectures which don't have fast access to IO ports, exist
>>virtio does not work there ATM
>> 
>> 4) setups with many PCI bridges exist and have the same issue
>>as PCI express. virtio does not work there ATM
>> 
>> 5) On x86, even with nested page tables, firmware only decodes
>>the page address on an invalid PTE, not the data. You need to
>>emulate the guest to get at the data. Without
>>nested page tables, we have to do page table walk and emulate
>>to get both address and data. Since this is how MMIO
>>is implemented in kvm on x86, MMIO is much slower than PIO
>>(with nested page tables by a factor of >2, did not test without).
>
> Oh I forgot:
>
> 6) access to MMIO BARs is painful in the BIOS environment
>so BIOS would typically need to enable IO for the boot device.

But if you want to boot from the 16th device, the BIOS needs to solve
this problem anyway.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> >> Look, it's very simple.
>> > We only need to do it if we do a change that breaks guests.
>> >
>> > Please find a guest that is broken by the patches. You won't find any.
>> 
>> I think the problem in this whole discussion is that we're talking past
>> each other.
>> 
>> Here is my understanding:
>> 
>> 1) PCI-e says that you must be able to disable IO bars and still have a
>> functioning device.
>> 
>> 2) It says (1) because you must size IO bars to 4096 which means that
>> practically speaking, once you enable a dozen or so PIO bars, you run
>> out of PIO space (16 * 4k == 64k and not all that space can be used).
>
>
> Let me add 3 other issues which I mentioned and you seem to miss:
>
> 3) architectures which don't have fast access to IO ports, exist
>virtio does not work there ATM

Which architectures have PCI but no IO ports?

> 4) setups with many PCI bridges exist and have the same issue
>as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

> 5) On x86, even with nested page tables, firmware only decodes
>the page address on an invalid PTE, not the data. You need to
>emulate the guest to get at the data. Without
>nested page tables, we have to do page table walk and emulate
>to get both address and data. Since this is how MMIO
>is implemented in kvm on x86, MMIO is much slower than PIO
>(with nested page tables by a factor of >2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

>> virtio-pci uses a IO bars exclusively today.  Existing guest drivers
>> assume that there is an IO bar that contains the virtio-pci registers.
>> So let's consider the following scenarios:
>> 
>> QEMU of today:
>> 
>> 1) qemu -drive file=ubuntu-13.04.img,if=virtio
>> 
>> This works today.  Does adding an MMIO bar at BAR1 break this?
>> Certainly not if the device is behind a PCI bus...
>> 
>> But are we going to put devices behind a PCI-e bus by default?  Are we
>> going to ask the user to choose whether devices are put behind a legacy
>> bus or the express bus?
>> 
>> What happens if we put the device behind a PCI-e bus by default?  Well,
>> it can still work.  That is, until we do something like this:
>> 
>> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
>> -device virtio-balloon..
>> 
>> Such that we have more than a dozen or so devices.  This works
>> perfectly fine today.  It works fine because we've designed virtio to
>> make sure it works fine.  Quoting the spec:
>> 
>> "Configuration space is generally used for rarely-changing or
>>  initialization-time parameters. But it is a limited resource, so it
>>  might be better to use a virtqueue to update configuration information
>>  (the network device does this for filtering, otherwise the table in the
>>  config space could potentially be very large)."
>> 
>> In fact, we can have 100s of PCI devices today without running out of IO
>> space because we're so careful about this.
>> 
>> So if we switch to using PCI-e by default *and* we keep virtio-pci
>> without modifying the device IDs, then very frequently we are going to
>> break existing guests because the drivers they already have no longer
>> work.
>> 
>> A few virtio-serial channels, a few block devices, a couple of network
>> adapters, the balloon and RNG driver, and we hit the IO space limit
>> pretty damn quickly so this is not a contrived scenario at all.  I would
>> expect that we frequently run into this if we don't address this problem.
>> 
>> So we have a few options:
>> 1) Punt all of this complexity to libvirt et al and watch people make
>>the wrong decisions about when to use PCI-e.  This will become yet
>>another example of KVM being too hard to configure.
>> 
>> 2) Enable PCI-e by default and just force people to upgrade their
>>drivers.
>> 
>> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
>> 
>> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
>
> We can't do this - it wil

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> Look, it's very simple.
> We only need to do it if we do a change that breaks guests.
>
> Please find a guest that is broken by the patches. You won't find any.

I think the problem in this whole discussion is that we're talking past
each other.

Here is my understanding:

1) PCI-e says that you must be able to disable IO bars and still have a
functioning device.

2) It says (1) because you must size IO bars to 4096 which means that
practically speaking, once you enable a dozen or so PIO bars, you run
out of PIO space (16 * 4k == 64k and not all that space can be used).

virtio-pci uses a IO bars exclusively today.  Existing guest drivers
assume that there is an IO bar that contains the virtio-pci registers.

So let's consider the following scenarios:

QEMU of today:

1) qemu -drive file=ubuntu-13.04.img,if=virtio

This works today.  Does adding an MMIO bar at BAR1 break this?
Certainly not if the device is behind a PCI bus...

But are we going to put devices behind a PCI-e bus by default?  Are we
going to ask the user to choose whether devices are put behind a legacy
bus or the express bus?

What happens if we put the device behind a PCI-e bus by default?  Well,
it can still work.  That is, until we do something like this:

2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
-device virtio-balloon..

Such that we have more than a dozen or so devices.  This works
perfectly fine today.  It works fine because we've designed virtio to
make sure it works fine.  Quoting the spec:

"Configuration space is generally used for rarely-changing or
 initialization-time parameters. But it is a limited resource, so it
 might be better to use a virtqueue to update configuration information
 (the network device does this for filtering, otherwise the table in the
 config space could potentially be very large)."

In fact, we can have 100s of PCI devices today without running out of IO
space because we're so careful about this.

So if we switch to using PCI-e by default *and* we keep virtio-pci
without modifying the device IDs, then very frequently we are going to
break existing guests because the drivers they already have no longer
work.

A few virtio-serial channels, a few block devices, a couple of network
adapters, the balloon and RNG driver, and we hit the IO space limit
pretty damn quickly so this is not a contrived scenario at all.  I would
expect that we frequently run into this if we don't address this problem.

So we have a few options:

1) Punt all of this complexity to libvirt et al and watch people make
   the wrong decisions about when to use PCI-e.  This will become yet
   another example of KVM being too hard to configure.

2) Enable PCI-e by default and just force people to upgrade their
   drivers.

3) Don't use PCI-e by default but still add BAR1 to virtio-pci

4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely), give
   it a new device/vendor ID.   Continue to use virtio-pci for existing
   devices potentially adding virtio-{net,blk,...}-pcie variants for
   people that care to use them.

I think 1 == 2 == 3 and I view 2 as an ABI breaker.  libvirt does like
policy so they're going to make a simple decision and always use the
same bus by default.  I suspect if we made PCI the default, they might
just always set the PCI-e flag just because.

There are hundreds of thousands if not millions of guests with existing
virtio-pci drivers.  Forcing them to upgrade better have an extremely
good justification.

I think 4 is the best path forward.  It's better for users (guests
continue to work as they always have).  There's less confusion about
enabling PCI-e support--you must ask for the virtio-pcie variant and you
must have a virtio-pcie driver.  It's easy to explain.

It also maps to what regular hardware does.  I highly doubt that there
are any real PCI cards that made the shift from PCI to PCI-e without
bumping at least a revision ID.

It also means we don't need to play games about sometimes enabling IO
bars and sometimes not.

Regards,

Anthony Liguori

>
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 10:08:37AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
>> >> "Michael S. Tsirkin"  writes:
>> >> 
>> >> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
>> >> > You mean make BAR0 an MMIO BAR?
>> >> > Yes, it would break current windows guests.
>> >> > Further, as long as we use same address to notify all queues,
>> >> > we would also need to decode the instruction on x86 and that's
>> >> > measureably slower than PIO.
>> >> > We could go back to discussing hypercall use for notifications,
>> >> > but that has its own set of issues...
>> >> 
>> >> So... does "violating the PCI-e" spec really matter?  Is it preventing
>> >> any guest from working properly?
>> >
>> > Yes, absolutely, this wording in spec is not there without reason.
>> >
>> > Existing guests allocate io space for PCI express ports in
>> > multiples on 4K.
>> >
>> > Since each express device is behind such a port, this means
>> > at most 15 such devices can use IO ports in a system.
>> >
>> > That's why to make a pci express virtio device,
>> > we must allow MMIO and/or some other communication
>> > mechanism as the spec requires.
>> 
>> This is precisely why this is an ABI breaker.
>> 
>> If you disable IO bars in the BIOS, than the interface that the OS sees
>> will *not have an IO bar*.
>> 
>> This *breaks existing guests*.
>> Any time the programming interfaces changes on a PCI device, the
>> revision ID and/or device ID must change.  The spec is very clear about
>> this.
>> 
>> We cannot disable the IO BAR without changing revision ID/device ID.
>> 
>
> But it's a bios/PC issue. It's not a device issue.
>
> Anyway, let's put express aside.
>
> It's easy to create non-working setups with pci, today:
>
> - create 16 pci bridges
> - put one virtio device behind each
>
> boom
>
> Try it.
>
> I want to fix that.
>
>
>> > That's on x86.
>> >
>> > Besides x86, there are achitectures where IO is unavailable or very slow.
>> >
>> >> I don't think we should rush an ABI breakage if the only benefit is
>> >> claiming spec compliance.
>> >> 
>> >> Regards,
>> >> 
>> >> Anthony Liguori
>> >
>> > Why do you bring this up? No one advocates any ABI breakage,
>> > I only suggest extensions.
>> 
>> It's an ABI breakage.  You're claiming that the guests you tested
>> handle the breakage reasonably but it is unquestionably an ABI breakage.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Adding BAR is not an ABI breakage, do we agree on that?
>
> Disabling IO would be but I am not proposing disabling IO.
>
> Guests might disable IO.

Look, it's very simple.

If the failure in the guest is that BAR0 mapping fails because the
device is enabled but the BAR is disabled, then you've broken the ABI.

And what's worse is that this isn't for an obscure scenario (like having
15 PCI bridges) but for something that would become the standard
scenario (using a PCI-e bus).

We need to either bump the revision ID or the device ID if we do this.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jun 05, 2013 at 07:59:33AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
>> > You mean make BAR0 an MMIO BAR?
>> > Yes, it would break current windows guests.
>> > Further, as long as we use same address to notify all queues,
>> > we would also need to decode the instruction on x86 and that's
>> > measureably slower than PIO.
>> > We could go back to discussing hypercall use for notifications,
>> > but that has its own set of issues...
>> 
>> So... does "violating the PCI-e" spec really matter?  Is it preventing
>> any guest from working properly?
>
> Yes, absolutely, this wording in spec is not there without reason.
>
> Existing guests allocate io space for PCI express ports in
> multiples on 4K.
>
> Since each express device is behind such a port, this means
> at most 15 such devices can use IO ports in a system.
>
> That's why to make a pci express virtio device,
> we must allow MMIO and/or some other communication
> mechanism as the spec requires.

This is precisely why this is an ABI breaker.

If you disable IO bars in the BIOS, than the interface that the OS sees
will *not have an IO bar*.

This *breaks existing guests*.

Any time the programming interfaces changes on a PCI device, the
revision ID and/or device ID must change.  The spec is very clear about
this.

We cannot disable the IO BAR without changing revision ID/device ID.

> That's on x86.
>
> Besides x86, there are achitectures where IO is unavailable or very slow.
>
>> I don't think we should rush an ABI breakage if the only benefit is
>> claiming spec compliance.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Why do you bring this up? No one advocates any ABI breakage,
> I only suggest extensions.

It's an ABI breakage.  You're claiming that the guests you tested
handle the breakage reasonably but it is unquestionably an ABI breakage.

Regards,

Anthony Liguori

>
>
>> >
>> > -- 
>> > MST
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-05 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
> You mean make BAR0 an MMIO BAR?
> Yes, it would break current windows guests.
> Further, as long as we use same address to notify all queues,
> we would also need to decode the instruction on x86 and that's
> measureably slower than PIO.
> We could go back to discussing hypercall use for notifications,
> but that has its own set of issues...

So... does "violating the PCI-e" spec really matter?  Is it preventing
any guest from working properly?

I don't think we should rush an ABI breakage if the only benefit is
claiming spec compliance.

Regards,

Anthony Liguori

>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Jordan Justen  writes:

> On Fri, May 31, 2013 at 11:35 AM, Anthony Liguori  
> wrote:
>> As I think more about it, I think forking edk2 is inevitable.  We need a
>> clean repo that doesn't include the proprietary binaries.  I doubt
>> upstream edk2 is willing to remove the binaries.
>
> No, probably not unless a BSD licensed alternative was available. :)
>
> But, in thinking about what might make sense for EDK II with git, one
> option that should be considered is breaking the top-level 'packages'
> into separate sub-modules. I had gone so far as to start pushing repos
> as sub-modules.
>
> But, as the effort to convert EDK II to git has stalled (actually
> never even thought about leaving the ground), I abandoned that
> approach and went back to just mirroring one EDK II.
>
> I could fairly easily re-enable mirror the sub-set of packages needed
> for OVMF. So, in that case, the FatBinPkg sub-module could easily be
> dropped from a tree.
>
>> But this can be quite simple using a combination of git-svn and a
>> rewriting script.  We did exactly this to pull out the VGABios from
>> Bochs and remove the binaries associated with it.  It's 100% automated
>> and can be kept in sync via a script on qemu.org.
>
> I would love to mirror the BaseTools as a sub-package without all the
> silly windows binaries... What script did you guys use?

We did this in git pre-history, now git has a fancy git-filter-branch
command that makes it a breeze:

http://git-scm.com/book/ch6-4.html

Regards,

Anthony Liguori

>
> -Jordan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Jordan Justen  writes:

> On Fri, May 31, 2013 at 7:38 AM, Anthony Liguori  
> wrote:
>> In terms of creating a FAT module, the most likely source would seem to
>> be the kernel code and since that's GPL, I don't think it's terribly
>> avoidable to end up with a GPL'd uefi implementation.
>
> Why would OpenBSD not be a potential source?
>
> http://www.openbsd.org/cgi-bin/cvsweb/src/sys/msdosfs/

If someone is going to do it, that's fine.

But if me, it's going to be a GPL base.  Actually, enabling GPL
contributions to OVMF is a major motivating factor for me in this whole
discussion.

Regards,

Anthony Liguori

>
> We have a half-done ext2 fs from GSoC2011 that started with OpenBSD.
>
> https://github.com/the-ridikulus-rat/Tianocore_Ext2Pkg
>
>> If that's inevitable, then we're wasting effort by rewriting stuff under
>> a BSD license.
>>
>> Regards,
>>
>> Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Paolo Bonzini  writes:

> Il 31/05/2013 19:06, Anthony Liguori ha scritto:
>> David Woodhouse  writes:
>> 
>>> On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote:
>>>> It's even more fundamental.  OVMF as a whole (at least in it's usable
>>>> form) is not Open Source. 
>>>
>>> The FAT module is required to make EDK2 usable, and yes, that's not Open
>>> Source. So in a sense you're right.
>>>
>>> But we're talking here about *replacing* the FAT module with something
>>> that *is* open source. And the FAT module isn't a fundamental part of
>>> EDK2; it's just an optional module that happens to be bundled with the
>>> repository.
>> 
>> So *if* we replace the FAT module *and* that replacement was GPL, would
>> there be any objects to having more GPL modules for things like virtio,
>> ACPI, etc?
>> 
>> And would that be doable in the context of OVMF or would another project
>> need to exist for this purpose?
>
> I don't think it would be doable in TianoCore.  I think it would end up
> either in distros, or in QEMU.

As I think more about it, I think forking edk2 is inevitable.  We need a
clean repo that doesn't include the proprietary binaries.  I doubt
upstream edk2 is willing to remove the binaries.

But this can be quite simple using a combination of git-svn and a
rewriting script.  We did exactly this to pull out the VGABios from
Bochs and remove the binaries associated with it.  It's 100% automated
and can be kept in sync via a script on qemu.org.

> A separate question is whether OVMF makes more sense as part of
> TianoCore or rather as part of QEMU.

I'm not sure if qemu.git is the right location, but we can certainly
host an ovmf.git on qemu.git that embeds the scrubbed version of
edk2.git.

Of course, this would enable us to add GPL code (including a FAT module)
to ovmf.git without any impact on upstream edk2.

> With 75% of the free hypervisors
> now reunited under the same source repository, the balance is
> tilting...

 :-)

Regards,

Anthony Liguori

>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek  writes:

> On 05/31/13 16:38, Anthony Liguori wrote:
>
>> It's either Open Source or it's not.  It's currently not.
>
> I disagree with this binary representation of Open Source or Not. If it
> weren't (mostly) Open Source, how could we fork (most of) it as you're
> suggesting (from the soapbox :))?
>
>> I have a hard
>> time sympathesizing with trying to work with a proprietary upstream.
>
> My experience has been positive.
>
> First of all, whether UEFI is a good thing or not is controversial. I
> won't try to address that.
>
> However UEFI is here to stay, machines are being shipped with it, Linux
> and other OSen try to support it. Developing (or running) an OS in
> combination with a specific firmware is sometimes easier / more economic
> in a virtual environment, hence there should be support for qemu + UEFI.
> It is this mindset that I operate in. (Oh, I also forgot to mention that
> this task has been assigned to me by my superiors as well :))
>
> Jordan, the OvmfPkg maintainer is responsive and progressive in the true
> FLOSS manner (*), which was a nice surprise for a project whose coding
> standards for example are made 100% after Windows source code, and whose
> mailing list is mostly subscribed to by proprietary vendors. Really when
> it comes to OvmfPkg patches the process follows the "normal" FLOSS
> development model.
>
> (*) Jordan, I hope this will prompt you to merge VirtioNetDxe v4 real
> soon now :)

(Removing seabios from the CC as we've moved far away from seabios as a topic)

Just so no one gets the wrong idea, the OVMF team is now a victim of
their own success.  I had hoped that no one would do the work necessary
to get us to the point where we had to seriously think about UEFI
support but that's where we are now :-)

> Thus far we've been talking copyright rather than patents, but there's
> also this:
>
> http://en.wikipedia.org/wiki/FAT_filesystem#Challenge
> http://en.wikipedia.org/wiki/FAT_filesystem#Patent_infringement_lawsuits
>
> It almost doesn't matter who prevails in such a lawsuit; the
> *possibility* of such a lawsuit gives people cold feet. Blame the
> USPTO.

Just to say it once so I don't have to ever say it again.

I'm not going to discuss anything relating to patents and FAT publicly.
Everyone should consult with their respective lawyers on such issues.

Copyright is straight forward.  Patents are not.

Regards,

Anthony Liguori

>
> Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
David Woodhouse  writes:

> On Fri, 2013-05-31 at 10:43 -0500, Anthony Liguori wrote:
>> It's even more fundamental.  OVMF as a whole (at least in it's usable
>> form) is not Open Source. 
>
> The FAT module is required to make EDK2 usable, and yes, that's not Open
> Source. So in a sense you're right.
>
> But we're talking here about *replacing* the FAT module with something
> that *is* open source. And the FAT module isn't a fundamental part of
> EDK2; it's just an optional module that happens to be bundled with the
> repository.

So *if* we replace the FAT module *and* that replacement was GPL, would
there be any objects to having more GPL modules for things like virtio,
ACPI, etc?

And would that be doable in the context of OVMF or would another project
need to exist for this purpose?

> So I think you're massively overstating the issue. OVMF/EDK2 *is* Open
> Source, and replacing the FAT module really isn't that hard.
>
> We can only bury our heads in the sand and ship qemu with
> non-EFI-capable firmware for so long...

Which is why I think we need to solve the real problem here.

> I *know* there's more work to be done. We have SeaBIOS-as-CSM, Jordan
> has mostly sorted out the NV variable storage, and now the FAT issue is
> coming up to the top of the pile. But we aren't far from the point where
> we can realistically say that we want the Open Source OVMF to be the
> default firmware shipped with qemu.

Yes, that's why I'm raising this now.  We all knew that we'd have to
talk about this eventually.

Regards,

Anthony Liguori

>
> -- 
> dwmw2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
David Woodhouse  writes:

> On Fri, 2013-05-31 at 08:04 -0500, Anthony Liguori wrote:
>> 
>> 
>> 
>> Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
>> solvable problem.
>
> Heh. Actually it doesn't need to be a fork. It's modular, and the FAT
> driver is just a single module. Which is actually included in *binary*
> form in the EDK2 repository, I believe, and its source code is
> elsewhere.
>
> We could happily make a GPL¹ or LGPL implementation of a FAT module and
> build our OVMF with that instead, and we wouldn't need to fork OVMF at
> all.

So can't we have GPL virtio modules too?  I don't think there's any
problem there except for the FAT module.

I would propose more of a virtual fork.  It could consist of a git repo with
the GPL modules + a submodule for edk2.  Ideally, there would be no need
to actually fork edk2.

My assumption is that edk2 won't take GPL code.  But does ovmf really
need to live in the edk2 tree?

If we're going to get serious about supporting OVMF, it we need
something that isn't proprietary.

> -- 
> dwmw2
>
> ¹ If it's GPL, of course, then we mustn't include any *other* binary
> blobs in our OVMF build. But the whole point in this conversation is
> that we don't *want* to do that. So that's fine.

It's even more fundamental.  OVMF as a whole (at least in it's usable
form) is not Open Source.  Without even tackling the issue of GPL code
sharing, that is a fundamental problem that needs to be solved if we're
going to serious about making changes to QEMU to support it.

I think solving the general problem will also enable GPL code sharing
though.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek  writes:

> On 05/31/13 15:04, Anthony Liguori wrote:
>> Laszlo Ersek  writes:
>> 
>>> On 05/31/13 09:09, Jordan Justen wrote:
>>>
>>> Due to licensing differences I can't just port code from SeaBIOS to
>>> OVMF
>> 
>> 
>
> :)
>
>> Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
>> solvable problem.
>
> It's not optimal for the "upstream first" principle;



OVMF is not Open Source so "upstream first" doesn't apply.  At least,
the FAT module is not Open Source.

Bullet 8 from the Open Source Definition[1]

"8. License Must Not Be Specific to a Product

The rights attached to the program must not depend on the program's
being part of a particular software distribution. If the program is
extracted from that distribution and used or distributed within the
terms of the program's license, all parties to whom the program is
redistributed should have the same rights as those that are granted in
conjunction with the original software distribution."

License from OVMF FAT module[2]:

"Additional terms: In addition to the forgoing, redistribution and use
of the code is conditioned upon the FAT 32 File System Driver and all
derivative works thereof being used for and designed only to read and/or
write to a file system that is directly managed by: Intel’s Extensible
Firmware Initiative (EFI) Specification v. 1.0 and later and/or the
Unified Extensible Firmware Interface (UEFI) Forum’s UEFI Specifications
v.2.0 and later (together the “UEFI Specifications”); only as necessary
to emulate an implementation of the UEFI Specifications; and to create
firmware, applications, utilities and/or drivers."

[1] http://opensource.org/osd-annotated
[2] 
http://sourceforge.net/apps/mediawiki/tianocore/index.php?title=Edk2-fat-driver

AFAIK, for the systems that we'd actually want to use OVMF for, a FAT
module is a hard requirement.

> we'd have to
> backport upstream edk2 patches forever (there's a whole lot of edk2
> modules outside of direct OvmfPkg that get built into OVMF.fd -- OvmfPkg
> "only" customizes / cherry-picks the full edk2 tree for virtual
> machines), or to periodically rebase an ever-increasing set of patches.
>
> Independently, we need *some* FAT driver (otherwise you can't even boot
> most installer media), which is where the already discussed worries lie.
> Whatever solves this aspect is independent of forking all of edk2.

It's either Open Source or it's not.  It's currently not.  I have a hard
time sympathesizing with trying to work with a proprietary upstream.

>> Rewriting BSD implementations of everything is silly.  Every other
>> vendor that uses TianoCore has a proprietary fork.
>
> Correct, but they (presumably) keep rebasing their ever accumulating
> stuff at least on the periodically refreshed "stable edk2 subset"
> (UDK2010, which BTW doesn't include OvmfPkg). This must be horrible for
> them, but in exchange they get to remain proprietary (which may benefit
> them commercially).
>
>> Maintaining a GPL
>> fork seems just as reasonable.
>
> Perhaps; diverging from "upstream first" would hurt for certain.

Well I'm suggesting creating a real upstream (that is actually Open
Source).  Then I'm all for upstream first.

In terms of creating a FAT module, the most likely source would seem to
be the kernel code and since that's GPL, I don't think it's terribly
avoidable to end up with a GPL'd uefi implementation.

If that's inevitable, then we're wasting effort by rewriting stuff under
a BSD license.

Regards,

Anthony Liguori

>
>> 
>
> Thanks for the suggestion :)
> Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Laszlo Ersek  writes:

> On 05/31/13 09:09, Jordan Justen wrote:
>
> Due to licensing differences I can't just port code from SeaBIOS to
> OVMF



Fork OVMF, drop the fat module, and just add GPL code.  It's an easily
solvable problem.

Rewriting BSD implementations of everything is silly.  Every other
vendor that uses TianoCore has a proprietary fork.  Maintaining a GPL
fork seems just as reasonable.



Regards,

Anthony Liguori

> (and I never have without explicit permission), so it's been a lot of
> back and forth with acpidump / iasl -d in guests (massage OVMF, boot
> guest, check guest dmesg / lspci, dump tables, compare, repeat), brain
> picking colleagues, the ACPI and PIIX specs and so on. I have a page on
> the RH intranet dedicated to this. When something around these parts is
> being changed (or looks like it could be changed) in SeaBIOS, or between
> qemu and SeaBIOS, I always must be alert and consider reimplementing it
> in, or porting it with permission to, OVMF. (Most recent example:
> pvpanic device -- currently only in SeaBIOS.)
>
> It worries me that if I slack off, or am busy with something else, or
> simply don't notice, then the gap will widen again. I appreciate
> learning a bunch about ACPI, and don't mind the days of work that went
> into some of my simple-looking ACPI patches for OVMF, but had the tables
> come from a common (programmatic) source, none of this would have been
> an issue, and I wouldn't have felt even occasionally that ACPI patches
> for OVMF were both duplicate work *and* futile (considering how much
> ahead SeaBIOS was).
>
> I don't mind reimplementing stuff, or porting it with permission, going
> forward, but the sophisticated parts in SeaBIOS are a hard nut. For
> example I'll never be able to auto-extract offsets from generated AML
> and patch the AML using those offsets; the edk2 build tools (a project
> separate from edk2) don't support this, and it takes several months to
> get a thing as simple as gcc-47 build flags into edk2-buildtools.
>
> Instead I have to write template ASL, compile it to AML, hexdump the
> result, verify it against the AML grammar in the ACPI spec (offsets
> aren't obvious, BytePrefix and friends are a joy), define & initialize a
> packed struct or array in OVMF, and patch the template AML using fixed
> field names or array subscripts. Workable, but dog slow. If the ACPI
> payload came from up above, we might be as well provided with a list of
> (canonical name, offset, size) triplets, and could perhaps blindly patch
> the contents. (Not unlike Michael's linker code for connecting tables
> into a hierarchy.)
>
> AFAIK most recently iasl got built-in support for offset extraction (and
> in the process the current SeaBIOS build method was broken...), so that
> part might get easier in the future.
>
> Oh well it's Friday, sorry about this rant! :) I'll happily do what I
> can in the current status quo, but frequently, it won't amount to much.
>
> Thanks,
> Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-31 Thread Anthony Liguori
Kevin O'Connor  writes:

> On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
>> There were discussions on potentially introducing a middle component
>> to generate the tables.  Coreboot was raised as a possibility, and
>> David thought it would be okay to use coreboot for both OVMF and
>> SeaBIOS.  The possibility was also raised of a "rom" that lives in the
>> qemu repo, is run in the guest, and generates the tables (which is
>> similar to the hvmloader approach that Xen uses).
>
> Given the objections to implementing ACPI directly in QEMU, one
> possible way forward would be to split the current SeaBIOS rom into
> two roms: "qvmloader" and "seabios".  The "qvmloader" would do the
> qemu specific platform init (pci init, smm init, mtrr init, bios
> tables) and then load and run the regular seabios rom.  With this
> split, qvmloader could be committed into the QEMU repo and maintained
> there.  This would be analogous to Xen's hvmloader with the seabios
> code used as a starting point to implement it.

What about a small change to the SeaBIOS build system to allow ACPI
table generation to be done via a "plugin".

This could be as simple as moving acpi.c and *.dsl into the QEMU build
tree and then having a way to point the SeaBIOS makefiles to our copy of
it.

Then the logic is maintained stays in firmware but the churn happens in
the QEMU tree instead of the SeaBIOS tree.

Regards,

Anthony Liguori

>
> With both the hardware implementation and acpi descriptions for that
> hardware in the same source code repository, it would be possible to
> implement changes to both in a single patch series.  The fwcfg entries
> used to pass data between qemu and qvmloader could also be changed in
> a single patch and thus those fwcfg entries would not need to be
> considered a stable interface.  The qvmloader code also wouldn't need
> the 16bit handlers that seabios requires and thus wouldn't need the
> full complexity of the seabios build.  Finally, it's possible that
> both ovmf and seabios could use a single qvmloader implementation.
>
> On the down side, reboots can be a bit goofy today in kvm, and that
> would need to be settled before something like qvmloader could be
> implemented.  Also, it may be problematic to support passing of bios
> tables from qvmloader to seabios for guests with only 1 meg of ram.
>
> Thoughts?
> -Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
>> Stefan Hajnoczi  writes:
>> 
>> > On Thu, May 30, 2013 at 7:23 AM, Rusty Russell  
>> > wrote:
>> >> Anthony Liguori  writes:
>> >>> Rusty Russell  writes:
>> >>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>> >>>>> FWIW, I think what's more interesting is using vhost-net as a 
>> >>>>> networking
>> >>>>> backend with virtio-net in QEMU being what's guest facing.
>> >>>>>
>> >>>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>> >>>>> line of defense against a malicious guest while still getting the
>> >>>>> performance advantages of vhost-net (zero-copy).
>> >>>>>
>> >>>> It would be an interesting idea if we didn't already have the vhost
>> >>>> model where we don't need the userspace bounce.
>> >>>
>> >>> The model is very interesting for QEMU because then we can use vhost as
>> >>> a backend for other types of network adapters (like vmxnet3 or even
>> >>> e1000).
>> >>>
>> >>> It also helps for things like fault tolerance where we need to be able
>> >>> to control packet flow within QEMU.
>> >>
>> >> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
>> >>
>> >> Then I'm really confused as to what this would look like.  A zero copy
>> >> sendmsg?  We should be able to implement that today.
>> >>
>> >> On the receive side, what can we do better than readv?  If we need to
>> >> return to userspace to tell the guest that we've got a new packet, we
>> >> don't win on latency.  We might reduce syscall overhead with a
>> >> multi-dimensional readv to read multiple packets at once?
>> >
>> > Sounds like recvmmsg(2).
>> 
>> Could we map this to mergable rx buffers though?
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Yes because we don't have to complete buffers in order.

What I meant though was for GRO, we don't know how large the received
packet is going to be.  Mergable rx buffers lets us allocate a pool of
data for all incoming packets instead of allocating max packet size *
max packets.

recvmmsg expects an array of msghdrs and I presume each needs to be
given a fixed size.  So this seems incompatible with mergable rx
buffers.

Regards,

Anthony Liguori

>
>> >
>> > Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> Forcing a guest driver change is a really big
>> deal and I see no reason to do that unless there's a compelling reason
>> to.
>>
>> So we're stuck with the 1.0 config layout for a very long time.
>
> We definitely must not force a guest change.  The explicit aim of the
> standard is that "legacy" and 1.0 be backward compatible.  One
> deliverable is a document detailing how this is done (effectively a
> summary of changes between what we have and 1.0).

If 2.0 is fully backwards compatible, great.  It seems like such a
difference that that would be impossible but I need to investigate
further.

Regards,

Anthony Liguori

>
> It's a delicate balancing act.  My plan is to accompany any changes in
> the standard with a qemu implementation, so we can see how painful those
> changes are.  And if there are performance implications, measure them.
>
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
Stefan Hajnoczi  writes:

> On Thu, May 30, 2013 at 7:23 AM, Rusty Russell  wrote:
>> Anthony Liguori  writes:
>>> Rusty Russell  writes:
>>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>>>> FWIW, I think what's more interesting is using vhost-net as a networking
>>>>> backend with virtio-net in QEMU being what's guest facing.
>>>>>
>>>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>>>>> line of defense against a malicious guest while still getting the
>>>>> performance advantages of vhost-net (zero-copy).
>>>>>
>>>> It would be an interesting idea if we didn't already have the vhost
>>>> model where we don't need the userspace bounce.
>>>
>>> The model is very interesting for QEMU because then we can use vhost as
>>> a backend for other types of network adapters (like vmxnet3 or even
>>> e1000).
>>>
>>> It also helps for things like fault tolerance where we need to be able
>>> to control packet flow within QEMU.
>>
>> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
>>
>> Then I'm really confused as to what this would look like.  A zero copy
>> sendmsg?  We should be able to implement that today.
>>
>> On the receive side, what can we do better than readv?  If we need to
>> return to userspace to tell the guest that we've got a new packet, we
>> don't win on latency.  We might reduce syscall overhead with a
>> multi-dimensional readv to read multiple packets at once?
>
> Sounds like recvmmsg(2).

Could we map this to mergable rx buffers though?

Regards,

Anthony Liguori

>
> Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> Rusty Russell  writes:
>>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>>> FWIW, I think what's more interesting is using vhost-net as a networking
>>>> backend with virtio-net in QEMU being what's guest facing.
>>>> 
>>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>>>> line of defense against a malicious guest while still getting the
>>>> performance advantages of vhost-net (zero-copy).
>>>>
>>> It would be an interesting idea if we didn't already have the vhost
>>> model where we don't need the userspace bounce.
>>
>> The model is very interesting for QEMU because then we can use vhost as
>> a backend for other types of network adapters (like vmxnet3 or even
>> e1000).
>>
>> It also helps for things like fault tolerance where we need to be able
>> to control packet flow within QEMU.
>
> (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
>
> Then I'm really confused as to what this would look like.  A zero copy
> sendmsg?  We should be able to implement that today.

The only trouble with sendmsg would be doing batch submission and
asynchronous completion.

A thread pool could certainly be used for this I guess.

Regards,

Anthony Liguori

> On the receive side, what can we do better than readv?  If we need to
> return to userspace to tell the guest that we've got a new packet, we
> don't win on latency.  We might reduce syscall overhead with a
> multi-dimensional readv to read multiple packets at once?
>
> Confused,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] KVM call agenda for 2013-05-28

2013-05-29 Thread Anthony Liguori
Gerd Hoffmann  writes:

> On 05/29/13 01:53, Kevin O'Connor wrote:
>> On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
>>> Juan is not available now, and Anthony asked for
>>> agenda to be sent early.
>>> So here comes:
>>>
>>> Agenda for the meeting Tue, May 28:
>>>
>>> - Generating acpi tables
>> 
>> I didn't see any meeting notes, but I thought it would be worthwhile
>> to summarize the call.  This is from memory so correct me if I got
>> anything wrong.
>> 
>> Anthony believes that the generation of ACPI tables is the task of the
>> firmware.  Reasons cited include security implications of running more
>> code in qemu vs the guest context,
>
> I fail to see the security issues here.  It's not like the apci table
> generation code operates on untrusted input from the guest ...

But possibly untrusted input from a malicious user.  You can imagine
something like a IaaS provider that let's a user input arbitrary values
for memory, number of nics, etc.

It's a stretch of an example, I agree, but the general principle I think
is sound:  we should push as much work as possible to the least
privileged part of the stack.  In this case, firmware has much less
privileges than QEMU.

>> complexities in running iasl on
>> big-endian machines,
>
> We already have a bunch of prebuilt blobs in the qemu repo for simliar
> reasons, we can do that with iasl output too.
>
>> possible complexity of having to regenerate
>> tables on a vm reboot,
>
> Why tables should be regenerated at reboot?  I remember hotplug being
> mentioned in the call.  Hmm?  Which hotplugged component needs acpi
> table updates to work properly?  And what is the point of hotplugging if
> you must reboot the guest anyway to get the acpi updates needed?
> Details please.

See my response to Michael.

> Also mentioned in the call: "architectural reasons", which I understand
> as "real hardware works that way".  Correct.  But qemu's virtual
> hardware is configurable in more ways than real hardware, so we have
> different needs.  For example: pci slots can or can't be hotpluggable.
> On real hardware this is fixed.  IIRC this is one of the reasons why we
> have to patch acpi tables.

It's not really fixed.  Hardware supports PCI expansion chassises.
Multi-node NUMA systems also affect the ACPI tables.

>> overall sloppiness of doing it in QEMU.
>
> /me gets the feeling that this is the *main* reason, given that the
> other ones don't look very convincing to me.
>
>> Raised
>> that QOM interface should be sufficient.
>
> Agree on this one.  Ideally the acpi table generation code should be
> able to gather all information it needs from the qom tree, so it can be
> a standalone C file instead of being scattered over all qemu.

Ack.  So my basic argument is why not expose the QOM interfaces to
firmware and move the generation code there?  Seems like it would be
more or less a copy/paste once we had a proper implementation in QEMU.

>> There were discussions on potentially introducing a middle component
>> to generate the tables.  Coreboot was raised as a possibility, and
>> David thought it would be okay to use coreboot for both OVMF and
>> SeaBIOS.
>
> Certainly an option, but that is a long-term project.

Out of curiousity, are there other benefits to using coreboot as a core
firmware in QEMU?

Is there a payload we would ever plausibly use besides OVMF and SeaBIOS?

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-28

2013-05-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, May 28, 2013 at 07:53:09PM -0400, Kevin O'Connor wrote:
>> On Thu, May 23, 2013 at 03:41:32PM +0300, Michael S. Tsirkin wrote:
>> > Juan is not available now, and Anthony asked for
>> > agenda to be sent early.
>> > So here comes:
>> > 
>> > Agenda for the meeting Tue, May 28:
>> > 
>> > - Generating acpi tables
>> 
>> I didn't see any meeting notes, but I thought it would be worthwhile
>> to summarize the call.  This is from memory so correct me if I got
>> anything wrong.
>> 
>> Anthony believes that the generation of ACPI tables is the task of the
>> firmware.  Reasons cited include security implications of running more
>> code in qemu vs the guest context, complexities in running iasl on
>> big-endian machines, possible complexity of having to regenerate
>> tables on a vm reboot, overall sloppiness of doing it in QEMU.  Raised
>> that QOM interface should be sufficient.
>> 
>> Kevin believes that the bios table code should be moved up into QEMU.
>> Reasons cited include the churn rate in SeaBIOS for this QEMU feature
>> (15-20% of all SeaBIOS commits since integrating with QEMU have been
>> for bios tables; 20% of SeaBIOS commits in last year), complexity of
>> trying to pass all the content needed to generate the tables (eg,
>> device details, power tree, irq routing), complexity of scheduling
>> changes across different repos and synchronizing their rollout,
>> complexity of implemeting the code in both OVMF and SeaBIOS.  Kevin
>> wasn't aware of a requirement to regenerate acpi tables on a vm
>> reboot.
>
> I think this last one is based on a misunderstanding: it's based
> on assumption that we we change hardware by hotplug
> we should regenerate the tables to match.
> But there's no management that can take advantage of
> this.
> Two possible reasonable things we can tell management:
> - hotplug for device XXX is not supported: restart qemu
>   to make guest use the device
> - hotplug for device XXX is supported

This introduces an assumption: that the device model never radically
changes across resets.

Why should this be true?  Shouldn't we be allowed to increase the amount
of memory the guest has across reboots?  That's equivalent to adding
another DIMM after power off.

Not generating tables on reset does limit what we can do in a pretty
fundamental way.  Even if you can argue it in the short term, I don't
think it's viable in the long term.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> > I'm guessing any compiler that decides to waste memory in this way
>> > will quickly get dropped by users and then we won't worry
>> > about building QEMU with it.
>> 
>> There are other responses in the thread here and I don't really care to
>> bikeshed on this issue.
>
> Great. Let's make the bikeshed blue then?

It's fun to argue about stuff like this and I certainly have an opinion,
but I honestly don't care all that much about the offsetof thing.
However...


>
>> >> Well, given that virtio is widely deployed today, I would think the 1.0
>> >> standard should strictly reflect what's deployed today, no?
>> >> Any new config layout would be 2.0 material, right?
>> >
>> > Not as it's currently planned. Devices can choose
>> > to support a legacy layout in addition to the new one,
>> > and if you look at the patch you will see that that
>> > is exactly what it does.
>> 
>> Adding a new BAR most certainly requires bumping the revision ID or
>> changing the device ID, no?
>
> No, why would it?

If we change the programming interface for a device in a way that is
incompatible, we are required to change the revision ID and/or device
ID.

> If a device dropped BAR0, that would be a good reason
> to bump revision ID.
> We don't do this yet.

But we have to drop BAR0 to put it behind a PCI express bus, right?

If that's the case, then device that's exposed on the PCI express bus
must use a different device ID and/or revision ID.

That means a new driver is needed in the guest.

>> Didn't we run into this problem with the virtio-win drivers with just
>> the BAR size changing? 
>
> Because they had a bug: they validated BAR0 size. AFAIK they don't care
> what happens with other bars.

I think there's a grey area with respect to the assumptions a device can
make about the programming interface.

But very concretely, we cannot expose virtio-pci-net via PCI express
with BAR0 disabled because that will result in existing virtio-pci Linux
drivers breaking.

> Not we. The BIOS can disable IO BAR: it can do this already
> but the device won't be functional.

But the only way to expose the device over PCI express is to disable the
IO BAR, right?

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Paolo Bonzini  writes:

> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
>> You expect a compiler to pad this structure:
>> 
>> struct foo {
>>  uint8_t a;
>>  uint8_t b;
>>  uint16_t c;
>>  uint32_t d;
>> };
>> 
>> I'm guessing any compiler that decides to waste memory in this way
>> will quickly get dropped by users and then we won't worry
>> about building QEMU with it.
>
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.

Not that these structures are actually used for something.

We store the config in these structures so they are actually used for
something.

The proposed structures only serve as a way to express offsets.  You
would never actually have a variable of this type.

Regards,

Anthony Liguori

>
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
>> 1) C makes no guarantees about structure layout beyond the first
>>member.  Yes, if it's naturally aligned or has a packed attribute,
>>GCC does the right thing.  But this isn't kernel land anymore,
>>portability matters and there are more compilers than GCC.
>
> You expect a compiler to pad this structure:
>
> struct foo {
>   uint8_t a;
>   uint8_t b;
>   uint16_t c;
>   uint32_t d;
> };
>
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.

There are other responses in the thread here and I don't really care to
bikeshed on this issue.

>> Well, given that virtio is widely deployed today, I would think the 1.0
>> standard should strictly reflect what's deployed today, no?
>> Any new config layout would be 2.0 material, right?
>
> Not as it's currently planned. Devices can choose
> to support a legacy layout in addition to the new one,
> and if you look at the patch you will see that that
> is exactly what it does.

Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 

>> Re: the new config layout, I don't think we would want to use it for
>> anything but new devices.  Forcing a guest driver change
>
> There's no forcing.
> If you look at the patches closely, you will see that
> we still support the old layout on BAR0.
>
>
>> is a really big
>> deal and I see no reason to do that unless there's a compelling reason
>> to.
>
> There are many a compelling reasons, and they are well known
> limitations of virtio PCI:
>
> - PCI spec compliance (madates device operation with IO memory
> disabled).

PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.

> - support 64 bit addressing

We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.

> - add more than 32 feature bits.
> - individually disable queues.
> - sanely support cross-endian systems.
> - support very small (<1 PAGE) for virtio rings.
> - support a separate page for each vq kick.
> - make each device place config at flexible offset.

None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori


> Addressing any one of these would cause us to add a substantially new
> way to operate virtio devices.
>
> And since it's a guest change anyway, it seemed like a
> good time to do the new layout and fix everything in one go.
>
> And they are needed like yesterday.
>
>
>> So we're stuck with the 1.0 config layout for a very long time.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Absolutely. This patch let us support both which will allow for
> a gradual transition over the next 10 years or so.
>
>> > reason.  I suggest that's 2.0 material...
>> >
>> > Cheers,
>> > Rusty.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: updated: kvm networking todo wiki

2013-05-29 Thread Anthony Liguori
Rusty Russell  writes:

> "Michael S. Tsirkin"  writes:
>> On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
>>> "Michael S. Tsirkin"  writes:
>>> 
>>> > On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
>>> >> On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
>>> >> > Hey guys,
>>> >> > I've updated the kvm networking todo wiki with current projects.
>>> >> > Will try to keep it up to date more often.
>>> >> > Original announcement below.
>>> >> 
>>> >> Thanks a lot. I've added the tasks I'm currently working on to the wiki.
>>> >> 
>>> >> btw. I notice the virtio-net data plane were missed in the wiki. Is the
>>> >> project still being considered?
>>> >
>>> > It might have been interesting several years ago, but now that linux has
>>> > vhost-net in kernel, the only point seems to be to
>>> > speed up networking on non-linux hosts.
>>> 
>>> Data plane just means having a dedicated thread for virtqueue processing
>>> that doesn't hold qemu_mutex.
>>> 
>>> Of course we're going to do this in QEMU.  It's a no brainer.  But not
>>> as a separate device, just as an improvement to the existing userspace
>>> virtio-net.
>>> 
>>> > Since non-linux does not have kvm, I doubt virtio is a bottleneck.
>>> 
>>> FWIW, I think what's more interesting is using vhost-net as a networking
>>> backend with virtio-net in QEMU being what's guest facing.
>>> 
>>> In theory, this gives you the best of both worlds: QEMU acts as a first
>>> line of defense against a malicious guest while still getting the
>>> performance advantages of vhost-net (zero-copy).
>>
>> Great idea, that sounds very intresting.
>>
>> I'll add it to the wiki.
>>
>> In fact a bit of complexity in vhost was put there in the vague hope to
>> support something like this: virtio rings are not translated through
>> regular memory tables, instead, vhost gets a pointer to ring address.
>>
>> This allows qemu acting as a man in the middle,
>> verifying the descriptors but not touching the
>>
>> Anyone interested in working on such a project?
>
> It would be an interesting idea if we didn't already have the vhost
> model where we don't need the userspace bounce.

The model is very interesting for QEMU because then we can use vhost as
a backend for other types of network adapters (like vmxnet3 or even
e1000).

It also helps for things like fault tolerance where we need to be able
to control packet flow within QEMU.

Regards,

Anthony Liguori

> We already have two
> sets of host side ring code in the kernel (vhost and vringh, though
> they're being unified).
>
> All an accelerator can offer on the tx side is zero copy and direct
> update of the used ring.  On rx userspace could register the buffers and
> the accelerator could fill them and update the used ring.  It still
> needs to deal with merged buffers, for example.
>
> You avoid the address translation in the kernel, but I'm not convinced
> that's a key problem.
>
> Cheers,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> "Michael S. Tsirkin"  writes:
>>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>>> +return proxy->device_feature_select;
>>
>> Oh dear no...  Please use defines like the rest of QEMU.
>
> It is pretty ugly.

I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.

> Yet the structure definitions are descriptive, capturing layout, size
> and endianness in natural a format readable by any C programmer.

>From an API design point of view, here are the problems I see:

1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.

> So AFAICT the question is, do we put the required
>
> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
>
> etc. in the kernel headers or qemu?

I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.

>> Haven't looked at the proposed new ring layout yet.
>
> No change, but there's an open question on whether we should nail it to
> little endian (or define the endian by the transport).
>
> Of course, I can't rule out that the 1.0 standard *may* decide to frob
> the ring layout somehow,

Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori

> but I'd think it would require a compelling
> reason.  I suggest that's 2.0 material...
>
> Cheers,
> Rusty.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
>> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
>> > hwaddr addr,
>> >  }
>> >  }
>> >  
>> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
>> > +  unsigned size)
>> > +{
>> > +VirtIOPCIProxy *proxy = opaque;
>> > +VirtIODevice *vdev = proxy->vdev;
>> > +
>> > +uint64_t low = 0xull;
>> > +
>> > +switch (addr) {
>> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>> > +return proxy->device_feature_select;
>> 
>> Oh dear no...  Please use defines like the rest of QEMU.
>
> Any good reason not to use offsetof?
> I see about 138 examples in qemu.

There are exactly zero:

$ find . -name "*.c" -exec grep -l "case offset" {} \;
$

> Anyway, that's the way Rusty wrote it in the kernel header -
> I started with defines.
> If you convince Rusty to switch I can switch too,

We have 300+ devices in QEMU that use #defines.  We're not using this
kind of thing just because you want to copy code from the kernel.

>> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
>> 
>> And:
>> 
>> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
>> 
>> Which lets virtio-pci be subclassable and then remaps the config space to
>> BAR2.
>
>
> Interesting. Have the spec anywhere?

Not yet, but working on that.

> You are saying this is going to conflict because
> of BAR2 usage, yes?

No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
be the vram mapping.  It also had to be an MMIO bar.

The new layout might make it easier to implement a device like this.  I
shared it mainly because I wanted to show the subclassing idea vs. just
tacking an option onto the existing virtio-pci code in QEMU.

Regards,

Anthony Liguori

> So let's only do this virtio-fb only for new layout, so we don't need
> to maintain compatibility. In particular, we are working
> on making memory BAR access fast for virtio devices
> in a generic way. At the moment they are measureably slower
> than PIO on x86.
>
>
>> Haven't looked at the proposed new ring layout yet.
>> 
>> Regards,
>
> No new ring layout. It's new config layout.
>
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> This adds support for new config, and is designed to work with
> the new layout code in Rusty's new layout branch.
>
> At the moment all fields are in the same memory BAR (bar 2).
> This will be used to test performance and compare
> memory, io and hypercall latency.
>
> Compiles but does not work yet.
> Migration isn't handled yet.
>
> It's not clear what do queue_enable/queue_disable
> fields do, not yet implemented.
>
> Gateway for config access with config cycles
> not yet implemented.
>
> Sending out for early review/flames.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pci.c | 393 
> +++--
>  hw/virtio/virtio-pci.h |  55 +++
>  hw/virtio/virtio.c |  20 +++
>  include/hw/virtio/virtio.h |   4 +
>  4 files changed, 458 insertions(+), 14 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 752991a..f4db224 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
> *proxy)
>  proxy->ioeventfd_started = false;
>  }
>  
> +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
> +{
> +VirtIODevice *vdev = proxy->vdev;
> +
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +virtio_pci_stop_ioeventfd(proxy);
> +}
> +
> +virtio_set_status(vdev, val & 0xFF);
> +
> +if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +virtio_pci_start_ioeventfd(proxy);
> +}
> +
> +if (vdev->status == 0) {
> +virtio_reset(proxy->vdev);
> +msix_unuse_all_vectors(&proxy->pci_dev);
> +}
> +}
> +
>  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
> @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> -virtio_pci_stop_ioeventfd(proxy);
> -}
> -
> -virtio_set_status(vdev, val & 0xFF);
> -
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> -virtio_pci_start_ioeventfd(proxy);
> -}
> -
> -if (vdev->status == 0) {
> -virtio_reset(proxy->vdev);
> -msix_unuse_all_vectors(&proxy->pci_dev);
> -}
> +virtio_pci_set_status(proxy, val);
>  
>  /* Linux before 2.6.34 sets the device as OK without enabling
> the PCI device bus master bit. In this case we need to disable
> @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
> hwaddr addr,
>  }
>  }
>  
> +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
> +  unsigned size)
> +{
> +VirtIOPCIProxy *proxy = opaque;
> +VirtIODevice *vdev = proxy->vdev;
> +
> +uint64_t low = 0xull;
> +
> +switch (addr) {
> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +return proxy->device_feature_select;

Oh dear no...  Please use defines like the rest of QEMU.

>From a QEMU pov, take a look at:

https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659

And:

https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb

Which lets virtio-pci be subclassable and then remaps the config space to
BAR2.

Haven't looked at the proposed new ring layout yet.

Regards,

Anthony Liguori

> +case offsetof(struct virtio_pci_common_cfg, device_feature):
> +/* TODO: 64-bit features */
> + return proxy->device_feature_select ? 0 : proxy->host_features;
> +case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
> +return proxy->guest_feature_select;
> +case offsetof(struct virtio_pci_common_cfg, guest_feature):
> +/* TODO: 64-bit features */
> + return proxy->guest_feature_select ? 0 : vdev->guest_features;
> +case offsetof(struct virtio_pci_common_cfg, msix_config):
> + return vdev->config_vector;
> +case offsetof(struct virtio_pci_common_cfg, num_queues):
> +/* TODO: more exact limit? */
> + return VIRTIO_PCI_QUEUE_MAX;
> +case offsetof(struct virtio_pci_common_cfg, device_status):
> +return vdev->status;
> +
> + /* About a specific virtqueue. */
> +case offsetof(struct virtio_pci_common_cfg, queue_select):
> +return  vdev->queue_s

Re: updated: kvm networking todo wiki

2013-05-24 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
>> On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
>> > Hey guys,
>> > I've updated the kvm networking todo wiki with current projects.
>> > Will try to keep it up to date more often.
>> > Original announcement below.
>> 
>> Thanks a lot. I've added the tasks I'm currently working on to the wiki.
>> 
>> btw. I notice the virtio-net data plane were missed in the wiki. Is the
>> project still being considered?
>
> It might have been interesting several years ago, but now that linux has
> vhost-net in kernel, the only point seems to be to
> speed up networking on non-linux hosts.

Data plane just means having a dedicated thread for virtqueue processing
that doesn't hold qemu_mutex.

Of course we're going to do this in QEMU.  It's a no brainer.  But not
as a separate device, just as an improvement to the existing userspace
virtio-net.

> Since non-linux does not have kvm, I doubt virtio is a bottleneck.

FWIW, I think what's more interesting is using vhost-net as a networking
backend with virtio-net in QEMU being what's guest facing.

In theory, this gives you the best of both worlds: QEMU acts as a first
line of defense against a malicious guest while still getting the
performance advantages of vhost-net (zero-copy).

> IMO yet another networking backend is a distraction,
> and confusing to users.
> In any case, I'd like to see virtio-blk dataplane replace
> non dataplane first. We don't want two copies of
> virtio-net in qemu.

100% agreed.

Regards,

Anthony Liguori

>
>> > 
>> >
>> > I've put up a wiki page with a kvm networking todo list,
>> > mainly to avoid effort duplication, but also in the hope
>> > to draw attention to what I think we should try addressing
>> > in KVM:
>> >
>> > http://www.linux-kvm.org/page/NetworkingTodo
>> >
>> > This page could cover all networking related activity in KVM,
>> > currently most info is related to virtio-net.
>> >
>> > Note: if there's no developer listed for an item,
>> > this just means I don't know of anyone actively working
>> > on an issue at the moment, not that no one intends to.
>> >
>> > I would appreciate it if others working on one of the items on this list
>> > would add their names so we can communicate better.  If others like this
>> > wiki page, please go ahead and add stuff you are working on if any.
>> >
>> > It would be especially nice to add autotest projects:
>> > there is just a short test matrix and a catch-all
>> > 'Cover test matrix with autotest', currently.
>> >
>> > Currently there are some links to Red Hat bugzilla entries,
>> > feel free to add links to other bugzillas.
>> >
>> > Thanks!
>> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, May 21, 2013 at 09:29:07AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote:
>> >> "Michael S. Tsirkin"  writes:
>> >> 
>> >> > On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
>> >> >> 
>> >> >> Hi
>> >> >> 
>> >> >> Please, send any topic that you are interested in covering.
>> >> >> 
>> >> >> Thanks, Juan.
>> >> >
>> >> > Generating acpi tables.
>> >> >
>> >> > Cc'd a bunch of people who might be interested in this topic.
>> >> 
>> >> Unfortunately I have a conflict this morning so I won't be able to
>> >> join.  I just saw Kevin's response here from last week and I'll respond
>> >> to it later this morning.
>> >
>> > Unfortunate.
>> > Let's talk about this on the next slot: next Tuesday, June 4 then.
>> > Could you keep your agenda clear on that day please?
>> 
>> Ack.
>> 
>> Perhaps we could move this call to bimonthly and cancel it less
>> frequently?  That will make it easier to reserve calendar time for it.
>
> I think you mean bi-weekly? If yes, ack.

I meant twice a month (or every other week).

Regards,

Anthony Liguori

>
>> >
>> >> Can we post the call for agenda for this call on Fridays in the future?
>> >> I need more than 24 hours to make sure to keep my calendar clear...
>> >> 
>> >> Regards,
>> >> 
>> >> Anthony Liguori
>> >
>> > We don't work on Fridays in Israel so that means we'll only be able to
>> > respond Sunday, and you'll only see it Monday anyway.
>> > Setting agenda Thursday is probably too aggressive?
>> 
>> Maybe we could use a wiki page to setup a rolling agenda?
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> >> >
>> >> > Kevin - could you join on Tuesday? There appears a disconnect
>> >> > between the seabios and qemu that a conf call
>> >> > might help resolve.
>> >> >
>> >> > -- 
>> >> > MST
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> >> > the body of a message to majord...@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, May 21, 2013 at 07:18:58AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> 
>> > On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
>> >> 
>> >> Hi
>> >> 
>> >> Please, send any topic that you are interested in covering.
>> >> 
>> >> Thanks, Juan.
>> >
>> > Generating acpi tables.
>> >
>> > Cc'd a bunch of people who might be interested in this topic.
>> 
>> Unfortunately I have a conflict this morning so I won't be able to
>> join.  I just saw Kevin's response here from last week and I'll respond
>> to it later this morning.
>
> Unfortunate.
> Let's talk about this on the next slot: next Tuesday, June 4 then.
> Could you keep your agenda clear on that day please?

Ack.

Perhaps we could move this call to bimonthly and cancel it less
frequently?  That will make it easier to reserve calendar time for it.

>
>> Can we post the call for agenda for this call on Fridays in the future?
>> I need more than 24 hours to make sure to keep my calendar clear...
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> We don't work on Fridays in Israel so that means we'll only be able to
> respond Sunday, and you'll only see it Monday anyway.
> Setting agenda Thursday is probably too aggressive?

Maybe we could use a wiki page to setup a rolling agenda?

Regards,

Anthony Liguori

>
>> >
>> > Kevin - could you join on Tuesday? There appears a disconnect
>> > between the seabios and qemu that a conf call
>> > might help resolve.
>> >
>> > -- 
>> > MST
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-05-21

2013-05-21 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Mon, May 20, 2013 at 12:57:47PM +0200, Juan Quintela wrote:
>> 
>> Hi
>> 
>> Please, send any topic that you are interested in covering.
>> 
>> Thanks, Juan.
>
> Generating acpi tables.
>
> Cc'd a bunch of people who might be interested in this topic.

Unfortunately I have a conflict this morning so I won't be able to
join.  I just saw Kevin's response here from last week and I'll respond
to it later this morning.

Can we post the call for agenda for this call on Fridays in the future?
I need more than 24 hours to make sure to keep my calendar clear...

Regards,

Anthony Liguori

>
> Kevin - could you join on Tuesday? There appears a disconnect
> between the seabios and qemu that a conf call
> might help resolve.
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes for 2013-04-23

2013-04-24 Thread Anthony Liguori
Eric Blake  writes:

> On 04/23/2013 08:45 AM, Juan Quintela wrote:
>> 
>> * 1.5 pending patches (paolo)
>>   anthony thinks nothing big is outstanding
>>   rdma: not probably for this release,  too big change on migration
>>   cpu-hotplug: andreas expect to get it for 1.5
>> 
>> 
>> * What can libvirt expect in 1.5 for introspection of command-line support?
>>   command extensions?  libvirt want then
>> * What are the rules for adding optional parameters to existing QMP
>>   commands?  Would it help if we had introspection of QMP commands?
>>   what are the options that each command support.
>> 
>>   command line could work for 1.5
>> if we got patches on the next 2 days we can get it.
>
> Goal is to provide a QMP command that provides JSON representation of
> command line options; I will help review whatever is posted to make sure
> we like the interface.  Anthony agreed the implementation should be
> relatively straightforward and okay to add after soft freeze (but must
> be before hard freeze).  Libvirt has some code that would like to make
> use of the new command-line introspection; Osier will probably be the
> first libvirt developer taking advantage of it - if we can swing it,
> we'd like libvirt 1.0.5 to use the new command (libvirt freezes this
> weekend for a May 2 release).
>
>>   rest of introspection need 1.6
>> it is "challenging"
>> we are interesting into feature introspection
>> and comand extensions?
>> one command to return the schema?
>
> Anthony was okay with the idea of a full JSON introspection of all QMP
> commands, but it is probably too big to squeeze into 1.5 timeframe.
> Furthermore, while the command will be useful, we should always be
> thinking about API - having to parse through JSON to see if a feature is
> present is not always the nicest interface; when adding a new feature,
> consider improving an existing query-* or adding a counterpart new
> query-* command that makes it much easier to tell if a feature is
> available, without having to resort to a QMP introspection.

Ack.

One of the problems with using schema introspection for feature
detection is that there isn't always a 1-1 mapping.  You can imagine
that we have an optional parameter that gets added to a structure and is
initially tied to a specific feature but later gets used by another
feature.

If a distro backports the later and not the former, but a management
tool uses this field to probe for the former feature, it will result in
a false positive.

That's why a more direct feature negotiation mechanism is better IMHO.

Regards,

Anthony Liguori


>
>>   if we change a command,  how we change the interface without
>>   changing the c-api?
>
> c-api is not yet a strong consideration (but see [1] below).  Also,
> there may be ways to design a C api that is robust to extensions (but
> that means designing it into the QMP up front when adding a new
> command); there has been some list traffic on this thought.
>
> More importantly, adding an optional parameter to an existing command is
> not okay unless something else is also available to tell whether the
> feature is usable - QMP introspection would solve this, but is not
> necessarily the most elegant way.  For now, while adding QMP
> introspection is a good idea, we still want case-by-case review of any
> command extensions.
>
>> 
>>   we can change "drive_mirror" to use a new command to see if there
>>   are the new features.
>
> drive-mirror changed in 1.4 to add optional buf-size parameter; right
> now, libvirt is forced to limit itself to 1.3 interface (no buf-size or
> granularity) because there is no introspection and no query-* command
> that witnesses that the feature is present.  Idea was that we need to
> add a new query-drive-mirror-capabilities (name subject to bikeshedding)
> command into 1.5 that would let libvirt know that buf-size/granularity
> is usable (done right, it would also prevent the situation of buf-size
> being a write-only interface where it is set when starting the mirror
> but can not be queried later to see what size is in use).
>
> Unclear whether anyone was signing up to tackle the addition of a query
> command counterpart for drive-mirror in time for 1.5.
>
>> 
>>   if we have a stable c-api we can do test cases that work. 
>
> Having such a testsuite would make a stable C API more important.
>
>> 
>> Eric will complete this with his undrestanding from libvirt point of
>> view.
>
> Also under discussion: the existing QMP 'screendump' command is not
> ideal (not extensible, doesn't allow fd passing, hard-coded output

Re: [PATCH-v2 1/2] virtio-scsi: create VirtIOSCSICommon

2013-04-08 Thread Anthony Liguori
"Nicholas A. Bellinger"  writes:

> From: Paolo Bonzini 
>
> This patch refactors existing virtio-scsi code into VirtIOSCSICommon
> in order to allow virtio_scsi_init_common() to be used by both internal
> virtio_scsi_init() and external vhost-scsi-pci code.
>
> Changes in Patch-v2:
>- Move ->get_features() assignment to virtio_scsi_init() instead of
>  virtio_scsi_init_common()


Any reason we're not doing this as a QOM base class?

Similiar to how the in-kernel PIT/PIC work using a common base class...

Regards,

Anthony Liguori

>
> Signed-off-by: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Cc: Asias He 
> Signed-off-by: Nicholas Bellinger 
> ---
>  hw/virtio-scsi.c |  192 
> +-
>  hw/virtio-scsi.h |  130 --
>  include/qemu/osdep.h |4 +
>  3 files changed, 178 insertions(+), 148 deletions(-)
>
> diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
> index 8620712..c59e9c6 100644
> --- a/hw/virtio-scsi.c
> +++ b/hw/virtio-scsi.c
> @@ -18,118 +18,6 @@
>  #include 
>  #include 
>  
> -#define VIRTIO_SCSI_VQ_SIZE 128
> -#define VIRTIO_SCSI_CDB_SIZE32
> -#define VIRTIO_SCSI_SENSE_SIZE  96
> -#define VIRTIO_SCSI_MAX_CHANNEL 0
> -#define VIRTIO_SCSI_MAX_TARGET  255
> -#define VIRTIO_SCSI_MAX_LUN 16383
> -
> -/* Response codes */
> -#define VIRTIO_SCSI_S_OK   0
> -#define VIRTIO_SCSI_S_OVERRUN  1
> -#define VIRTIO_SCSI_S_ABORTED  2
> -#define VIRTIO_SCSI_S_BAD_TARGET   3
> -#define VIRTIO_SCSI_S_RESET4
> -#define VIRTIO_SCSI_S_BUSY 5
> -#define VIRTIO_SCSI_S_TRANSPORT_FAILURE6
> -#define VIRTIO_SCSI_S_TARGET_FAILURE   7
> -#define VIRTIO_SCSI_S_NEXUS_FAILURE8
> -#define VIRTIO_SCSI_S_FAILURE  9
> -#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED   10
> -#define VIRTIO_SCSI_S_FUNCTION_REJECTED11
> -#define VIRTIO_SCSI_S_INCORRECT_LUN12
> -
> -/* Controlq type codes.  */
> -#define VIRTIO_SCSI_T_TMF  0
> -#define VIRTIO_SCSI_T_AN_QUERY 1
> -#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
> -
> -/* Valid TMF subtypes.  */
> -#define VIRTIO_SCSI_T_TMF_ABORT_TASK   0
> -#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET   1
> -#define VIRTIO_SCSI_T_TMF_CLEAR_ACA2
> -#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET   3
> -#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET  4
> -#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
> -#define VIRTIO_SCSI_T_TMF_QUERY_TASK   6
> -#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET   7
> -
> -/* Events.  */
> -#define VIRTIO_SCSI_T_EVENTS_MISSED0x8000
> -#define VIRTIO_SCSI_T_NO_EVENT 0
> -#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
> -#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
> -#define VIRTIO_SCSI_T_PARAM_CHANGE 3
> -
> -/* Reasons for transport reset event */
> -#define VIRTIO_SCSI_EVT_RESET_HARD 0
> -#define VIRTIO_SCSI_EVT_RESET_RESCAN   1
> -#define VIRTIO_SCSI_EVT_RESET_REMOVED  2
> -
> -/* SCSI command request, followed by data-out */
> -typedef struct {
> -uint8_t lun[8];  /* Logical Unit Number */
> -uint64_t tag;/* Command identifier */
> -uint8_t task_attr;   /* Task attribute */
> -uint8_t prio;
> -uint8_t crn;
> -uint8_t cdb[];
> -} QEMU_PACKED VirtIOSCSICmdReq;
> -
> -/* Response, followed by sense data and data-in */
> -typedef struct {
> -uint32_t sense_len;  /* Sense data length */
> -uint32_t resid;  /* Residual bytes in data buffer */
> -uint16_t status_qualifier;   /* Status qualifier */
> -uint8_t status;  /* Command completion status */
> -uint8_t response;/* Response values */
> -uint8_t sense[];
> -} QEMU_PACKED VirtIOSCSICmdResp;
> -
> -/* Task Management Request */
> -typedef struct {
> -uint32_t type;
> -uint32_t subtype;
> -uint8_t lun[8];
> -uint64_t tag;
> -} QEMU_PACKED VirtIOSCSICtrlTMFReq;
> -
> -typedef struct {
> -uint8_t response;
> -} QEMU_PACKED VirtIOSCSICtrlTMFResp;
> -
> -/* Asynchronous notification query/subscription */
> -typedef struct {
> -uint32_t type;
> -uint8_t lun[8];
> -uint32_t event_requested;
> -} QEMU_PACKED VirtIOSCSICtrlANReq;
> -
> -typedef struct {
> -uint32_t event_actual;
> -uint8_t response;
> -} QEMU_PACKED VirtIOSCSICtrlANResp;
> -
> -typedef struct {
> -uint32_t 

Re: KVH call agenda for 2013-02-05

2013-02-05 Thread Anthony Liguori
Juan Quintela  writes:

> Hi
>
> Please send in any agenda topics you are interested in.

FYI, I have a conflict for today so I won't be able to attend.

Regards,

Anthony Liguori

>
> Later, Juan.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 00/22] Multiqueue virtio-net

2013-02-04 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V4 RESEND 00/22] Multiqueue virtio-net

2013-02-04 Thread Anthony Liguori
Applied.  Thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Benjamin Herrenschmidt  writes:

> On Wed, 2013-01-30 at 17:54 +0100, Andreas Färber wrote:
>> 
>> That would require polymorphism since we already need to derive from
>> PCIDevice or ISADevice respectively for interfacing with the bus...
>> Modern object-oriented languages have tried to avoid multi-inheritence
>> due to arising complications, I thought. Wouldn't object if someone
>> wanted to do the dirty implementation work though. ;)
>> 
>> Another such example is EHCI, with PCIDevice and SysBusDevice
>> frontends,
>> sharing an EHCIState struct and having helper functions operating on
>> that core state only. Quite a few device share such a pattern today
>> actually (serial, m48t59, ...).
>
> This is a design bug of your model :-) You shouldn't derive from your
> bus interface IMHO but from your functional interface, and have an
> ownership relation to the PCIDevice (a bit like IOKit does if my memory
> serves me well).

Ack.  Hence:

SerialPCIDevice is-a PCIDevice
   has-a SerialChipset

The board that exports a bus interface is one object.  The chipset that
implements the functionality is another object.

The former's job in life is to map the bus interface to whatever
interface the functional object expects.  In most cases, this is just a
straight forward proxy of a MemoryRegion.  Sometimes this involves
address shifting, etc.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Benjamin Herrenschmidt  writes:

> On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
>> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
>> the top bit is set determines whether it's a "PIO" transaction or an
>> "MMIO" transaction.  A large chunk of that address space is invalid of
>> course.
>> 
>> PCI has a 65 bit address space too.  The 65th bit determines whether
>> it's an IO transaction or an MMIO transaction.
>
> This is somewhat an over simplification since IO and MMIO differs in
> other ways, such as ordering rules :-) But for the sake of memory
> regions decoding I suppose it will do.
>
>> For architectures that only have a 64-bit address space, what the PCI
>> controller typically does is pick a 16-bit window within that address
>> space to map to a PCI address with the 65th bit set.
>
> Sort-of yes. The window doesn't have to be 16-bit (we commonly have
> larger IO space windows on powerpc) and there's a window per host
> bridge, so there's effectively more than one IO space (as there is more
> than one PCI MMIO space, with only a window off the CPU space routed to
> each brigde).

Ack.

> Making a hard wired assumption that the PCI (MMIO and IO) space relates
> directly to the CPU bus space is wrong on pretty much all !x86
> architectures.

Ack.

>
>  .../...
>
> You make it sound like substractive decode is a chipset hack. It's not,
> it's specified in the PCI spec.

It's a hack :-)  It's a well specified hack, but it's still a hack.

>> 1) A chipset will route any non-positively decoded IO transaction (65th
>>bit set) to a single end point (usually the ISA-bridge).  Which one it
>>chooses is up to the chipset.  This is called subtractive decoding
>>because the PCI bus will wait multiple cycles for that device to
>>claim the transaction before bouncing it.
>
> This is not a chipset matter. It's the ISA bridge itself that does
> substractive decoding.

The PCI bus can have one end point that that can be the target for
subtractive decoding (not hard decoding, subtractive decoding).  IOW,
you can only have a single ISA Bridge within a single PCI domain.

You are right--chipset is the wrong word.  I'm used to thinking in terms
of only a single domain :-)

> There also exists P2P bridges doing such substractive
> decoding, this used to be fairly common with transparent bridges used for
> laptop docking.

I'm not sure I understand how this would work.  How can two devices on
the same PCI domain both do subtractive decoding?  Indeed, the PCI spec
even says:

"Subtractive decoding can be implemented by only one device on the bus
 since it accepts all accesses not positively decoded by some other
 agent."

>> 2) There are special hacks in most PCI chipsets to route very specific
>>addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>>go to the first VGA device.  Legacy IDE IO transactions go to the first
>>IDE device.  This doesn't need to be programmed in the BARs.  It will
>>just happen.
>
> This is also mostly not a hack in the chipset. It's a well defined behaviour
> for legacy devices, sometimes call hard decoding. Of course often those 
> devices
> are built into the chipset but they don't have to. Plug-in VGA devices will
> hard decode legacy VGA regions for both IO and MMIO by default (this can be
> disabled on most of them nowadays) for example. This has nothing to do with
> the chipset.

So I understand what you're saying re: PCI because the devices actually
assert DEVSEL to indicate that they handle the transaction.

But for PCI-E, doesn't the controller have to expressly identify what
the target is?  Is this done with the device class?

> There's a specific bit in P2P bridge to control the forwarding of legacy
> transaction downstream (and VGA palette snoops), this is also fully specified
> in the PCI spec.

Ack.

>
>> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>>sent to the ISA-bridge (because it's faster this way).
>
> Chipsets don't "send to a bridge". It's the bridge itself that
> decodes.

With PCI...

>> Notice the lack of the word "ISA" in all of this other than describing
>> the PCI class of an end point.
>
> ISA is only relevant to the extent that the "legacy" regions of IO space
> originate from the original ISA addresses of devices (VGA, IDE, etc...)
> and to the extent that an ISA bus might still be present which will get
> the transactions that nothing else have decoded in that space.

Ack.

>  
>> So h

Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Andreas Färber  writes:

> Am 30.01.2013 17:33, schrieb Anthony Liguori:
>> Gerd Hoffmann  writes:
>> 
>>>> hw/qxl.c:portio_list_add(qxl_vga_port_list,
>>>> pci_address_space_io(dev), 0x3b0);
>>>> hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);
>>>
>>> That reminds me I should solve this in a more elegant way.
>>>
>>> qxl takes over the vga io ports.  The reason it does this is because qxl
>>> switches into vga mode in case the vga ports are accessed while not in
>>> vga mode.  After doing the check (and possibly switching mode) the vga
>>> handler is called to actually handle it.
>> 
>> The best way to handle this would be to remodel how we do VGA.
>> 
>> Make VGACommonState a proper QOM object and use it as the base class for
>> QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.
>
> That would require polymorphism since we already need to derive from
> PCIDevice or ISADevice respectively for interfacing with the bus...

Nope.  You can use composition:

QXLDevice is-a VGACommonState

QXLPCI is-a PCIDevice
   has-a QXLDevice

> Modern object-oriented languages have tried to avoid multi-inheritence
> due to arising complications, I thought. Wouldn't object if someone
> wanted to do the dirty implementation work though. ;)

There is no need for MI.

> Another such example is EHCI, with PCIDevice and SysBusDevice frontends,
> sharing an EHCIState struct and having helper functions operating on
> that core state only. Quite a few device share such a pattern today
> actually (serial, m48t59, ...).

Yes, this is all about chipset modelling.  Chipsets should derive from
device and then be embedded in the appropriate bus device.

For instance.

SerialState is-a DeviceState

ISASerialState is-a ISADevice, has-a SerialState
MMIOSerialState is-a SysbusDevice, has-a SerialState

This is what we're doing in practice, we just aren't modeling the
chipsets and we're open coding the relationships (often in subtley
different ways).

Regards,

Anthony Liguori

>> The VGA accessors should be exposed as a memory region but the sub class
>> ought to be responsible for actually adding it to a subregion.
>> 
>>>
>>> That twist makes it a bit hard to convert vga ...
>>>
>>> Anyone knows how one would do that with the memory api instead? I think
>>> taking over the ports is easy as the memory regions have priorities so I
>>> can simply register a region with higher priority. I have no clue how to
>>> forward the access to the vga code though.
>>>
>> 
>> That should be possible with priorities, but I think it's wrong.  There
>> aren't two VGA devices.  QXL is-a VGA device and the best way to
>> override behavior of base VGA device is through polymorphism.
>
> In this particular case QXL is-a PCI VGA device though, so we can
> decouple it from core VGA modeling. Placing the MemoryRegionOps inside
> the Class (rather than static const) might be a short-term solution for
> overriding read/write handlers of a particular VGA MemoryRegion. :)
>
> Cheers,
> Andreas
>
>> This isn't really a memory API issue, it's a modeling issue.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>>> Anyone has clues / suggestions?
>>>
>>> thanks,
>>>   Gerd
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>> hw/qxl.c:portio_list_add(qxl_vga_port_list,
>> pci_address_space_io(dev), 0x3b0);
>> hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);
>
> That reminds me I should solve this in a more elegant way.
>
> qxl takes over the vga io ports.  The reason it does this is because qxl
> switches into vga mode in case the vga ports are accessed while not in
> vga mode.  After doing the check (and possibly switching mode) the vga
> handler is called to actually handle it.

The best way to handle this would be to remodel how we do VGA.

Make VGACommonState a proper QOM object and use it as the base class for
QXL, CirrusVGA, QEMUVGA (std-vga), and VMwareVGA.

The VGA accessors should be exposed as a memory region but the sub class
ought to be responsible for actually adding it to a subregion.

>
> That twist makes it a bit hard to convert vga ...
>
> Anyone knows how one would do that with the memory api instead? I think
> taking over the ports is easy as the memory regions have priorities so I
> can simply register a region with higher priority. I have no clue how to
> forward the access to the vga code though.
>

That should be possible with priorities, but I think it's wrong.  There
aren't two VGA devices.  QXL is-a VGA device and the best way to
override behavior of base VGA device is through polymorphism.

This isn't really a memory API issue, it's a modeling issue.

Regards,

Anthony Liguori

> Anyone has clues / suggestions?
>
> thanks,
>   Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Markus Armbruster  writes:

> Peter Maydell  writes:
>
>> On 30 January 2013 11:39, Andreas Färber  wrote:
>>> Proposal by hpoussin was to move _list_add() code to ISADevice:
>>> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>>>
>>> Concerns:
>>> * PCI devices (VGA, QXL) register I/O ports as well
>>>   => above patches add dependency on ISABus to machines
>>>  -> " no mac ever had one"
>>>   => PCIDevice shouldn't use ISA API with NULL ISADevice
>>> * Lack of avi: Who decides about memory API these days?
>>>
>>> armbru and agraf concluded that moving this into ISA is wrong.
>>>
>>> => I will drop the remaining ioport patches from above series.
>>>
>>> Suggestions on how to proceed with tackling the issue are welcome.
>>
>> How does this stuff work on real hardware? I would have
>> expected that a PCI device registering the fact it has
>> IO ports would have to do so via the PCI controller it
>> is plugged into...
>>
>> My naive don't-know-much-about-portio suggestion is that this
>> should work the same way as memory regions: each device
>> provides portio regions, and the controller for the bus
>> (ISA or PCI) exposes those to the next layer up, and
>> something at board level maps it all into the right places.
>
> Makes sense me, but I'm naive, too :)
>
> For me, "I/O ports" are just an alternate address space some devices
> have.  For instance, x86 CPUs have an extra pin for selecting I/O
> vs. memory address space.  The ISA bus has separate read/write pins for
> memory and I/O.
>
> This isn't terribly special.  Mapping address spaces around is what
> devices bridging buses do.
>
> I'd expect a system bus for an x86 CPU to have both a memory and an I/O
> address space.

There is no such thing as a "system bus".

There is a bus that links the CPUs to each other and to the North
Bridge.  This is QPI on modern systems.

Sometimes there's a bus to link the North Bridge to the South Bridge.
On modern systems, this is QPI.  On the i440fx, the i440fx is both the
South Bridge and North Bridge and the link between the two is internal
to the chip.  The South Bridge may then export one or more downstream
interfaces.  In the i440fx, it only exports PCI.

Behind the PCI bus, there may be bridges.  On the i440fx, there is a ISA
Bridge which also acts as a Super I/O chip.  It exposes a downstream ISA
bus.

sysbus is a relic of poor modeling.  A major milestone in QEMU's
evolution will be when sysbus is completely removed.

Regards,

Anthony Liguori

>
> I'd expect an ISA PC's sysbus - ISA bridge to map both directly.
>
> I'd expect an ISA bridge for a sysbus without a separate I/O address
> space to map the ISA I/O address space into the sysbus's normal address
> space somehow.
>
> PCI ISA bridges have their own rules, but I've gotten away with ignoring
> the details so far :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] QEMU buildbot maintenance state

2013-01-30 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>> Gerd: Are you willing to co-maintain the QEMU buildmaster with Daniel
>> and Christian?  It would be awesome if you could do this given your
>> experience running and customizing buildbot.
>
> I'll try to set aside some time for that.  Christians idea to host the
> config at github is good, that certainly makes it easier to balance
> things to more people.
>
> Another thing which would be helpful:  Any chance we can setup a
> maintainer tree mirror @ git.qemu.org?  A single repository where each
> maintainer tree shows up as a branch?

I will setup a tree based on the 'T:' fields in MAINTAINERS.  So if you
want your tree to be part of buildbot, please make sure that you have a
correct entry in MAINTAINERS.

Regards,

Anthony Liguori

>
> This would make the buildbot setup *alot* easier.  We can go for a
> AnyBranchScheduler then with BuildFactory and BuildConfig shared,
> instead of needing one BuildFactory and BuildConfig per branch.  Also
> makes the buildbot web interface less cluttered as we don't have a
> insane amount of BuildConfigs any more.  And saves some resources
> (bandwidth + diskspace) for the buildslaves.
>
> I think people who want to look what is coming or who want to test stuff
> cooking it would be a nice service too if they have a one-stop shop
> where they can get everything.
>
> cheers,
>   Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] What to do about non-qdevified devices?

2013-01-30 Thread Anthony Liguori
Markus Armbruster  writes:

> Peter Maydell  writes:
>
>> On 30 January 2013 07:02, Markus Armbruster  wrote:
>>> Anthony Liguori  writes:
>>>
>>> [...]
>>>> The problems I ran into were (1) this is a lot of work (2) it basically
>>>> requires that all bus children have been qdev/QOM-ified.  Even with
>>>> something like the ISA bus which is where I started, quite a few devices
>>>> were not qdevified still.
>>>
>>> So what's the plan to complete the qdevification job?  Lay really low
>>> and quietly hope the problem goes away?  We've tried that for about
>>> three years, doesn't seem to work.
>>
>> Do we have a list of not-yet-qdevified devices? Maybe we need to
>> start saying "fix X Y and Z or platform P is dropped from the next
>> release". (This would of course be easier if we had a way to let users
>> know that platform P was in danger...)
>
> I think that's a good idea.  Only problem is identifying pre-qdev
> devices in the code requires code inspection (grep won't do, I'm
> afraid).
>
> If we agree on a "qdevify or else" plan, I'd be prepared to help with
> the digging up of devices.

That's a nice thought, but we're not going to rip out dma.c and break
every PC target.

But I will help put together a list of devices that need converting.  I
have patches actually for most of the PC devices.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
Andreas Färber  writes:

> Am 29.01.2013 16:41, schrieb Juan Quintela:
>> * Portio port to new memory regions?
>>   Andreas, could you fill?
>
> MemoryRegion's .old_portio mechanism requires workarounds for VGA on
> ppc, affecting among others the sPAPR PCI host bridge:
> http://git.qemu.org/?p=qemu.git;a=commit;h=a3cfa18eb075c7ef78358ca1956fe7b01caa1724
>
> Patches were posted and merged removing all .old_portio users but one:
> hw/ioport.c:portio_list_add_1(), used by portio_list_add()
>
> hw/isa-bus.c:portio_list_add(piolist, isabus->address_space_io, start);
> hw/qxl.c:portio_list_add(qxl_vga_port_list,
> pci_address_space_io(dev), 0x3b0);
> hw/vga.c:portio_list_add(vga_port_list, address_space_io, 0x3b0);
> hw/vga.c:portio_list_add(vbe_port_list, address_space_io, 0x1ce);
>
> Proposal by hpoussin was to move _list_add() code to ISADevice:
> http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html

Okay, a couple things here:

There is no such thing as "PIO" as a general concept.  What leaves the
CPU and what a bus interprets are totally different things.

An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
the top bit is set determines whether it's a "PIO" transaction or an
"MMIO" transaction.  A large chunk of that address space is invalid of
course.

PCI has a 65 bit address space too.  The 65th bit determines whether
it's an IO transaction or an MMIO transaction.

For architectures that only have a 64-bit address space, what the PCI
controller typically does is pick a 16-bit window within that address
space to map to a PCI address with the 65th bit set.

Within the PCI bus, transactions are usually routed to devices via
positive decoding.  The device lists what address regions it wants to
handle (via BARs) and the PCI bus uses those to determine who to send
transactions to.

There are some exceptions though.  Specifically:

1) A chipset will route any non-positively decoded IO transaction (65th
   bit set) to a single end point (usually the ISA-bridge).  Which one it
   chooses is up to the chipset.  This is called subtractive decoding
   because the PCI bus will wait multiple cycles for that device to
   claim the transaction before bouncing it.

2) There are special hacks in most PCI chipsets to route very specific
   addresses ranges to certain devices.  Namely, legacy VGA IO transactions
   go to the first VGA device.  Legacy IDE IO transactions go to the first
   IDE device.  This doesn't need to be programmed in the BARs.  It will
   just happen.

3) As it turns out, all legacy PIIX3 devices are positively decoded and
   sent to the ISA-bridge (because it's faster this way).

Notice the lack of the word "ISA" in all of this other than describing
the PCI class of an end point.

So how should this be modeled?

On x86, the CPU has a pio address space.  That can propagate down
through the PCI bus which is what we do today.

On !x86, the PCI controller ought to setup a MemoryRegion for downstream
PIO that devices can use to register on.

We probably need to do something like change the PCI VGA devices to
export a MemoryRegion and allow the PCI controller to device how to
register that as a subregion.

Regards,

Anthony Liguori

>
> Concerns:
> * PCI devices (VGA, QXL) register I/O ports as well
>   => above patches add dependency on ISABus to machines
>  -> " no mac ever had one"
>   => PCIDevice shouldn't use ISA API with NULL ISADevice
> * Lack of avi: Who decides about memory API these days?
>
> armbru and agraf concluded that moving this into ISA is wrong.
>
> => I will drop the remaining ioport patches from above series.
>
> Suggestions on how to proceed with tackling the issue are welcome.
>
> Regards,
> Andreas
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

2013-01-30 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jan 30, 2013 at 11:48:14AM +, Peter Maydell wrote:
>> On 30 January 2013 11:39, Andreas Färber  wrote:
>> > Proposal by hpoussin was to move _list_add() code to ISADevice:
>> > http://lists.gnu.org/archive/html/qemu-devel/2013-01/msg00508.html
>> >
>> > Concerns:
>> > * PCI devices (VGA, QXL) register I/O ports as well
>> >   => above patches add dependency on ISABus to machines
>> >  -> " no mac ever had one"
>> >   => PCIDevice shouldn't use ISA API with NULL ISADevice
>> > * Lack of avi: Who decides about memory API these days?
>> >
>> > armbru and agraf concluded that moving this into ISA is wrong.
>> >
>> > => I will drop the remaining ioport patches from above series.
>> >
>> > Suggestions on how to proceed with tackling the issue are welcome.
>> 
>> How does this stuff work on real hardware? I would have
>> expected that a PCI device registering the fact it has
>> IO ports would have to do so via the PCI controller it
>> is plugged into...
>
> All programming is done by the OS, devices do not register
> with controller.
>
> Each bridge has two ways to claim an IO transaction:
> - transaction is within the window programmed in the bridge
> - subtractive decoding enabled and no one else claims the transaction

And there can only be one endpoint that accepts subtractive decoding and
this is usually the ISA bridge.

Also note that there are some really special cases with PCI.  The legacy
VGA ports are always routed to the first device with a DISPLAY class
type.

Likewise, with legacy IDE ports are routed to the first device with an
IDE class.  That's the only reason you can have these legacy devices not
behind the ISA bridge.

Regards,

Anthony Liguori

>
> At the bus level, transaction happens on a bus and an appropriate device
> will claim it.
>
>> My naive don't-know-much-about-portio suggestion is that this
>> should work the same way as memory regions: each device
>> provides portio regions, and the controller for the bus
>> (ISA or PCI) exposes those to the next layer up, and
>> something at board level maps it all into the right places.
>> 
>> -- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V2 11/20] tap: support enabling or disabling a queue

2013-01-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, Jan 29, 2013 at 08:10:26PM +, Blue Swirl wrote:
>> On Tue, Jan 29, 2013 at 1:50 PM, Jason Wang  wrote:
>> > On 01/26/2013 03:13 AM, Blue Swirl wrote:
>> >> On Fri, Jan 25, 2013 at 10:35 AM, Jason Wang  wrote:
>> >>> This patch introduce a new bit - enabled in TAPState which tracks 
>> >>> whether a
>> >>> specific queue/fd is enabled. The tap/fd is enabled during 
>> >>> initialization and
>> >>> could be enabled/disabled by tap_enalbe() and tap_disable() which calls 
>> >>> platform
>> >>> specific helpers to do the real work. Polling of a tap fd can only done 
>> >>> when
>> >>> the tap was enabled.
>> >>>
>> >>> Signed-off-by: Jason Wang 
>> >>> ---
>> >>>  include/net/tap.h |2 ++
>> >>>  net/tap-win32.c   |   10 ++
>> >>>  net/tap.c |   43 ---
>> >>>  3 files changed, 52 insertions(+), 3 deletions(-)
>> >>>
>> >>> diff --git a/include/net/tap.h b/include/net/tap.h
>> >>> index bb7efb5..0caf8c4 100644
>> >>> --- a/include/net/tap.h
>> >>> +++ b/include/net/tap.h
>> >>> @@ -35,6 +35,8 @@ int tap_has_vnet_hdr_len(NetClientState *nc, int len);
>> >>>  void tap_using_vnet_hdr(NetClientState *nc, int using_vnet_hdr);
>> >>>  void tap_set_offload(NetClientState *nc, int csum, int tso4, int tso6, 
>> >>> int ecn, int ufo);
>> >>>  void tap_set_vnet_hdr_len(NetClientState *nc, int len);
>> >>> +int tap_enable(NetClientState *nc);
>> >>> +int tap_disable(NetClientState *nc);
>> >>>
>> >>>  int tap_get_fd(NetClientState *nc);
>> >>>
>> >>> diff --git a/net/tap-win32.c b/net/tap-win32.c
>> >>> index 265369c..a2cd94b 100644
>> >>> --- a/net/tap-win32.c
>> >>> +++ b/net/tap-win32.c
>> >>> @@ -764,3 +764,13 @@ void tap_set_vnet_hdr_len(NetClientState *nc, int 
>> >>> len)
>> >>>  {
>> >>>  assert(0);
>> >>>  }
>> >>> +
>> >>> +int tap_enable(NetClientState *nc)
>> >>> +{
>> >>> +assert(0);
>> >> abort()
>> >
>> > This is just to be consistent with the reset of the helpers in this file.
>> >>
>> >>> +}
>> >>> +
>> >>> +int tap_disable(NetClientState *nc)
>> >>> +{
>> >>> +assert(0);
>> >>> +}
>> >>> diff --git a/net/tap.c b/net/tap.c
>> >>> index 67080f1..95e557b 100644
>> >>> --- a/net/tap.c
>> >>> +++ b/net/tap.c
>> >>> @@ -59,6 +59,7 @@ typedef struct TAPState {
>> >>>  unsigned int write_poll : 1;
>> >>>  unsigned int using_vnet_hdr : 1;
>> >>>  unsigned int has_ufo: 1;
>> >>> +unsigned int enabled : 1;
>> >> bool without bit field?
>> >
>> > Also to be consistent with other field. If you wish I can send patches
>> > to convert all those bit field to bool on top of this series.
>> 
>> That would be nice, likewise for the assert(0).
>
> OK so let's go ahead with this patchset as is,
> and a cleanup patch will be send after 1.4 then.

Why?  I'd prefer that we didn't rush things into 1.4 just because.
There's still ample time to respin a corrected series.

Regards,

Anthony Liguori

>
>
>> >
>> > Thanks
>> >>>  VHostNetState *vhost_net;
>> >>>  unsigned host_vnet_hdr_len;
>> >>>  } TAPState;
>> >>> @@ -72,9 +73,9 @@ static void tap_writable(void *opaque);
>> >>>  static void tap_update_fd_handler(TAPState *s)
>> >>>  {
>> >>>  qemu_set_fd_handler2(s->fd,
>> >>> - s->read_poll  ? tap_can_send : NULL,
>> >>> - s->read_poll  ? tap_send : NULL,
>> >>> - s->write_poll ? tap_writable : NULL,
>> >>> + s->read_poll && s->enabled ? tap_can_send : 
>> >>> NULL,
>> >>> + s->read_poll && s->enabled ? tap_send : 
>> >>> NULL,
>> >>> +   

Re: KVM call minutes 2013-01-29

2013-01-29 Thread Anthony Liguori
Alexander Graf  writes:

> On 01/29/2013 04:41 PM, Juan Quintela wrote:
>>Alex will fill this
>
> When using -device, we can not specify an IRQ line to attach to the 
> device. This works for some special buses like PCI, but not in the 
> generic case. We need it generically for virtio-mmio and for potential 
> platform assigned vfio devices though.
>
> The conclusion we came up with was that in order to model IRQ lines 
> between arbitrary devices, we should use QOM and the QOM name space. 
> Details are up for Anthony to fill in :).

Oh good :-)  This is how far I got since I last touched this problem.

https://github.com/aliguori/qemu/commits/qom-pin.4

qemu_irq is basically foreign to QOM/qdev.  There are two things I did
to solve this.  The first is to have a stateful Pin object.  Stateful is
important because qemu_irq is totally broken wrt reset and live
migration as it stands today.

It's pretty easy to have a Pin object that can "connect" to a qemu_irq
source or sink which means we can incrementally refactor by first
converting each device under a bus to using Pins (using the qemu_irq
connect interface to maintain compat) until the bus controller can be
converted to export Pins allowing a full switch to using Pins only for
that bus.

The problems I ran into were (1) this is a lot of work (2) it basically
requires that all bus children have been qdev/QOM-ified.  Even with
something like the ISA bus which is where I started, quite a few devices
were not qdevified still.

I'm not going to be able to work on this in the foreseeable future, but
if someone wants to take it over, I'd be happy to provide advice.

I'm also open to other approaches that require less refactoring but I
honestly don't know that there is a way to avoid the heavy lifting.

Regards,

Anthony Liguori

>
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call minutes 2013-01-29

2013-01-29 Thread Anthony Liguori
Paolo Bonzini  writes:

> Il 29/01/2013 16:41, Juan Quintela ha scritto:
>> * Replacing select(2) so that we will not hit the 1024 fd_set limit in the
>>   future. (stefan)
>> 
>>   Add checks for fd's bigger than 1024? multifunction devices uses lot
>>   of fd's for device.
>> 
>>   Portability?
>>   Use glib?  and let it use poll underneath.
>>   slirp is a problem.
>>   in the end loop: moving to a glib event loop, how we arrive there is the 
>> discussion.
>
> We can use g_poll while keeping the main-loop.c wrappers around the glib
> event loop.  Both slirp and iohandler.c access the fd_sets randomly, so
> we need to remember some state between the fill and poll functions.  We
> can use two main-loop.c functions:
>
> int qemu_add_poll_fd(int fd, int events);
>
>   select: writes the events into three fd_sets, returns the file
>   descriptor itself
>
>   poll: writes a GPollFD into a dynamically-sized array (of GPollFDs)
>   and returns the index in the array.
>
> int qemu_get_poll_fd_revents(int index);
>
>   select: takes the file descriptor (returned by qemu_add_poll_fd),
>   makes up revents based on the three fd_sets
>
>   poll: takes the index into the array and returns the corresponding
>   revents
>
> iohandler.c can simply store the index into struct IOHandlerRecord, and
> use it later.  slirp can do the same for struct socket.
>
> The select code can be kept for Windows after POSIX switches to poll.

Doesn't g_poll already do this under the covers for Windows?

Regards,

Anthony Liguori

>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2013-01-29

2013-01-28 Thread Anthony Liguori
Juan Quintela  writes:

> Hi
>
> Please send in any agenda topics you are interested in.

 - Outstanding virtio work for 1.4
   - Multiqueue virtio-net (Amos/Michael)
   - Refactorings (Fred/Peter)
   - virtio-ccw (Cornelia/Alex)

We need to work out the ordering here and what's reasonable to merge
over the next week.

Regards,

Anthony Liguori

>
> Later, Juan.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 16/19] target-ppc: Refactor debug output macros

2013-01-27 Thread Anthony Liguori
Andreas Färber  writes:

> Make debug output compile-testable even if disabled.
>
> Inline DEBUG_OP check in excp_helper.c.
> Inline LOG_MMU_STATE() in mmu_helper.c.
> Inline PPC_DEBUG_SPR check in translate_init.c.
>
> Signed-off-by: Andreas Färber 
> ---
>  target-ppc/excp_helper.c|   22 +++
>  target-ppc/kvm.c|9 ++-
>  target-ppc/mem_helper.c |2 --
>  target-ppc/mmu_helper.c |   63 
> +--
>  target-ppc/translate.c  |   12 -
>  target-ppc/translate_init.c |   10 +++
>  6 Dateien geändert, 55 Zeilen hinzugefügt(+), 63 Zeilen entfernt(-)
>
> diff --git a/target-ppc/excp_helper.c b/target-ppc/excp_helper.c
> index 0a1ac86..54722c4 100644
> --- a/target-ppc/excp_helper.c
> +++ b/target-ppc/excp_helper.c
> @@ -21,14 +21,14 @@
>  
>  #include "helper_regs.h"
>  
> -//#define DEBUG_OP
> -//#define DEBUG_EXCEPTIONS
> +#define DEBUG_OP 0
> +#define DEBUG_EXCEPTIONS 0
>  
> -#ifdef DEBUG_EXCEPTIONS
> -#  define LOG_EXCP(...) qemu_log(__VA_ARGS__)
> -#else
> -#  define LOG_EXCP(...) do { } while (0)
> -#endif
> +#define LOG_EXCP(...) G_STMT_START \
> +if (DEBUG_EXCEPTIONS) { \
> +qemu_log(__VA_ARGS__); \
> +} \
> +G_STMT_END

Just thinking out loud a bit..  This form becomes pretty common and it's
ashame to use a macro here if we don't have to.

I think:

static inline void LOG_EXCP(const char *fmt, ...)
{
if (debug_exceptions) {
   va_list ap;
   va_start(ap, fmt);
   qemu_logv(fmt, ap);
   va_end(ap);
}
}

Probably would have equivalent performance.  debug_exception would be
read-mostly and ought to be very predictable as a result.  I strongly
expect that the compiler would actually inline LOG_EXCP too.

I see LOG_EXCP and LOG_DIS in this series.  Perhaps we could just
introduce these functions and then make these flags run-time
controllable?

BTW, one advantage of this over your original proposal back to your
point is that you still won't catch linker errors with your proposal.
Dead code eliminate will kill off those branches before the linker ever
sees them.

Regards,

Anthony Liguori

>  
>  
> /*/
>  /* PowerPC Hypercall emulation */
> @@ -777,7 +777,7 @@ void ppc_hw_interrupt(CPUPPCState *env)
>  }
>  #endif /* !CONFIG_USER_ONLY */
>  
> -#if defined(DEBUG_OP)
> +#ifndef CONFIG_USER_ONLY
>  static void cpu_dump_rfi(target_ulong RA, target_ulong msr)
>  {
>  qemu_log("Return from exception at " TARGET_FMT_lx " with flags "
> @@ -835,9 +835,9 @@ static inline void do_rfi(CPUPPCState *env, target_ulong 
> nip, target_ulong msr,
>  /* XXX: beware: this is false if VLE is supported */
>  env->nip = nip & ~((target_ulong)0x0003);
>  hreg_store_msr(env, msr, 1);
> -#if defined(DEBUG_OP)
> -cpu_dump_rfi(env->nip, env->msr);
> -#endif
> +if (DEBUG_OP) {
> +cpu_dump_rfi(env->nip, env->msr);
> +}
>  /* No need to raise an exception here,
>   * as rfi is always the last insn of a TB
>   */
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index 2f4f068..0dc6657 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -37,15 +37,10 @@
>  #include "hw/spapr.h"
>  #include "hw/spapr_vio.h"
>  
> -//#define DEBUG_KVM
> +#define DEBUG_KVM 0
>  
> -#ifdef DEBUG_KVM
>  #define dprintf(fmt, ...) \
> -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define dprintf(fmt, ...) \
> -do { } while (0)
> -#endif
> +do { if (DEBUG_KVM) { fprintf(stderr, fmt, ## __VA_ARGS__); } } while (0)
>  
>  #define PROC_DEVTREE_CPU  "/proc/device-tree/cpus/"
>  
> diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c
> index 902b1cd..5c7a5ce 100644
> --- a/target-ppc/mem_helper.c
> +++ b/target-ppc/mem_helper.c
> @@ -26,8 +26,6 @@
>  #include "exec/softmmu_exec.h"
>  #endif /* !defined(CONFIG_USER_ONLY) */
>  
> -//#define DEBUG_OP
> -
>  
> /*/
>  /* Memory load and stores */
>  
> diff --git a/target-ppc/mmu_helper.c b/target-ppc/mmu_helper.c
> index ee168f1..9340fbb 100644
> --- a/target-ppc/mmu_helper.c
> +++ b/target-ppc/mmu_helper.c
> @@ -21,39 +21,36 @@
>  #include "sysemu/kvm.h"
>  #include "kvm_ppc.h"
>  
> -//#define DEBUG_MMU
> -//#define DEBUG_BATS
> -//#define DEBUG_SLB
> -//#define DEBUG_SOFTWARE_TLB
> +#define DEBUG_MMU 0
> +#define DEBUG_BATS 0
> +#define DEBUG_SLB

Re: [PATCH v6 00/11] s390: channel I/O support in qemu.

2013-01-25 Thread Anthony Liguori
Hi,

Thank you for submitting your patch series.  checkpatch.pl has
detected that one or more of the patches in this series violate
the QEMU coding style.

If you believe this message was sent in error, please ignore it
or respond here with an explanation.

Otherwise, please correct the coding style issues and resubmit a
new version of the patch.

For more information about QEMU coding style, see:

http://git.qemu.org/?p=qemu.git;a=blob_plain;f=CODING_STYLE;hb=HEAD

Here is the output from checkpatch.pl:

Subject: s390: Add s390-ccw-virtio machine.
Subject: s390: Add default support for SCLP console
ERROR: do not initialise statics to 0 or NULL
#72: FILE: vl.c:2468:
+static int index = 0;

WARNING: braces {} are necessary for all arms of this statement
#126: FILE: vl.c:3923:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#135: FILE: vl.c:3937:
+if (default_sclp)
[...]

WARNING: braces {} are necessary for all arms of this statement
#144: FILE: vl.c:4109:
+if (foreach_device_config(DEV_SCLP, sclp_parse) < 0)
[...]

total: 1 errors, 3 warnings, 114 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390-virtio: Factor out some initialization code.
Subject: s390: Add new channel I/O based virtio transport.
Subject: s390: Wire up channel I/O in kvm.
Subject: s390: Virtual channel subsystem support.
ERROR: need consistent spacing around '*' (ctx:WxV)
#56: FILE: hw/s390x/css.c:31:
+SubchDev *sch[MAX_SCHID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#62: FILE: hw/s390x/css.c:37:
+SubchSet *sch_set[MAX_SSID + 1];
  ^

ERROR: need consistent spacing around '*' (ctx:WxV)
#74: FILE: hw/s390x/css.c:49:
+CssImage *css[MAX_CSSID + 1];
  ^

total: 3 errors, 0 warnings, 1469 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Subject: s390: Add channel I/O instructions.
Subject: s390: I/O interrupt and machine check injection.
Subject: s390: Channel I/O basic definitions.
Subject: s390: Add mapping helper functions.
Subject: s390: Lowcore mapping helper.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-21 Thread Anthony Liguori
"Nicholas A. Bellinger"  writes:

> Hi MST & Co,
>
> On Thu, 2013-01-17 at 18:43 +0200, Michael S. Tsirkin wrote:
>> On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote:
>> > From: Nicholas Bellinger 
>> > 
>> > Hello Anthony & Co,
>> > 
>> > This is the fourth installment to add host virtualized target support for
>> > the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU 
>> > 1.3.0-rc.
>> > 
>> > The series is available directly from the following git branch:
>> > 
>> >git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git 
>> > vhost-scsi-for-1.3
>> > 
>> > Note the code is cut against yesterday's QEMU head, and dispite the name
>> > of the tree is based upon mainline qemu.org git code + has thus far been
>> > running overnight with > 100K IOPs small block 4k workloads using v3.6-rc2+
>> > based target code with RAMDISK_DR backstores.
>> > 
>> > Other than some minor fuzz between jumping from QEMU 1.2.0 -> 1.2.50, this
>> > series is functionally identical to what's been posted for vhost-scsi 
>> > RFC-v3
>> > to qemu-devel.
>> > 
>> > Please consider applying these patches for an initial vhost-scsi merge into
>> > QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed 
>> > for
>> > this series to in order to merge.
>> > 
>> > Thank you!
>> > 
>> > --nab
>> 
>> OK what's the status here?
>> We missed 1.3 but let's try not to miss 1.4?
>> 
>
> Unfortunately, I've not been able to get back to the conversion
> requested by Paolo for a standalone vhost-scsi PCI device.

Is your git repo above up to date?  Perhaps I can find someone to help
out..

> At this point my hands are still full with iSER-target for-3.9 kernel
> code over the next weeks.  
>
> What's the v1.4 feature cut-off looking like at this point..?

Hard freeze is on february 1st but 1.5 opens up again on the 15th.  So
the release windows shouldn't have a major impact on merging...

Regards,

Anthony Liguori

>
> --nab
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, Jan 16, 2013 at 09:09:49AM -0600, Anthony Liguori wrote:
>> Jason Wang  writes:
>> 
>> > On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> >> Jason Wang  writes:
>> >>
>> >>> Hello all:
>> >>>
>> >>> This seires is an update of last version of multiqueue virtio-net 
>> >>> support.
>> >>>
>> >>> Recently, linux tap gets multiqueue support. This series implements basic
>> >>> support for multiqueue tap, nic and vhost. Then use it as an 
>> >>> infrastructure to
>> >>> enable the multiqueue support for virtio-net.
>> >>>
>> >>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>> >>> userspace could be get much benefits since dataplane like parallized 
>> >>> mechanism
>> >>> were not implemented.
>> >>>
>> >>> User could start a multiqueue virtio-net card through adding a "queues"
>> >>> parameter to tap.
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>> >>> virtio-net-pci,netdev=hn0
>> >>>
>> >>> Management tools such as libvirt can pass multiple pre-created fds 
>> >>> through
>> >>>
>> >>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>> >>> virtio-net-pci,netdev=hn0
>> >> I'm confused/frightened that this syntax works.  You shouldn't be
>> >> allowed to have two values for the same property.  Better to have a
>> >> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>> >
>> > Yes, but this what current a StringList type works for command line.
>> > Some other parameters such as dnssearch, hostfwd and guestfwd have
>> > already worked in this way. Looks like your suggestions need some
>> > extension on QemuOps visitor, maybe we can do this on top.
>> 
>> It's a silly syntax and breaks compatibility.  This is valid syntax:
>> 
>> -net tap,fd=3,fd=4
>> 
>> In this case, it means 'fd=4' because the last fd overwrites the first
>> one.
>> 
>> Now you've changed it to mean something else.  Having one thing mean
>> something in one context, but something else in another context is
>> terrible interface design.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Aha so just renaming the field 'fds' would address this issue?

No, you still have the problem of different meanings.

-netdev tap,fd=X,fd=Y

-netdev tap,fds=X,fds=Y

Would have wildly different behavior.

Just do:

-netdev tap,fds=X:Y

And then we're staying consistent wrt the interpretation of multiple
properties of the same name.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/12] Multiqueue virtio-net

2013-01-16 Thread Anthony Liguori
Jason Wang  writes:

> On 01/15/2013 03:44 AM, Anthony Liguori wrote:
>> Jason Wang  writes:
>>
>>> Hello all:
>>>
>>> This seires is an update of last version of multiqueue virtio-net support.
>>>
>>> Recently, linux tap gets multiqueue support. This series implements basic
>>> support for multiqueue tap, nic and vhost. Then use it as an infrastructure 
>>> to
>>> enable the multiqueue support for virtio-net.
>>>
>>> Both vhost and userspace multiqueue were implemented for virtio-net, but
>>> userspace could be get much benefits since dataplane like parallized 
>>> mechanism
>>> were not implemented.
>>>
>>> User could start a multiqueue virtio-net card through adding a "queues"
>>> parameter to tap.
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device 
>>> virtio-net-pci,netdev=hn0
>>>
>>> Management tools such as libvirt can pass multiple pre-created fds through
>>>
>>> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
>>> virtio-net-pci,netdev=hn0
>> I'm confused/frightened that this syntax works.  You shouldn't be
>> allowed to have two values for the same property.  Better to have a
>> syntax like fd[0]=X,fd[1]=Y or something along those lines.
>
> Yes, but this what current a StringList type works for command line.
> Some other parameters such as dnssearch, hostfwd and guestfwd have
> already worked in this way. Looks like your suggestions need some
> extension on QemuOps visitor, maybe we can do this on top.

It's a silly syntax and breaks compatibility.  This is valid syntax:

-net tap,fd=3,fd=4

In this case, it means 'fd=4' because the last fd overwrites the first
one.

Now you've changed it to mean something else.  Having one thing mean
something in one context, but something else in another context is
terrible interface design.

Regards,

Anthony Liguori

>
> Thanks
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>> You can fetch and try the code from:
>>> git://github.com/jasowang/qemu.git
>>>
>>> Patch 1 adds a generic method of creating multiqueue taps and implement the
>>> linux part.
>>> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
>>> emulation codes to support multiqueue.
>>> Patch 5 introduces multiqueue support for qemu networking code: each peers 
>>> of
>>> NetClientState were abstracted as a queue. Though this, most of the codes 
>>> could
>>> be reusued without change.
>>> Patch 6 adds basic multiqueue support for vhost which could let vhost just
>>> handle a subset of all virtqueues.
>>> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
>>> virtio-net.
>>> Patch 9-12 implement the multiqueue support of virtio-net
>>>
>>> Changes from RFC v2:
>>> - rebase the codes to latest qemu
>>> - align the multiqueue virtio-net implementation to virtio spec
>>> - split the patches into more smaller patches
>>> - set_link and hotplug support
>>>
>>> Changes from RFC V1:
>>> - rebase to the latest
>>> - fix memory leak in parse_netdev
>>> - fix guest notifiers assignment/de-assignment
>>> - changes the command lines to:
>>>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>>>
>>> Reference:
>>> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
>>> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>>>
>>> Perf Numbers:
>>>
>>> Two Intel Xeon 5620 with direct connected intel 82599EB
>>> Host/Guest kernel: David net tree
>>> vhost enabled
>>>
>>> - lots of improvents of both latency and cpu utilization in request-reponse 
>>> test
>>> - get regression of guest sending small packets which because TCP tends to 
>>> batch
>>>   less when the latency were improved
>>>
>>> 1q/2q/4q
>>> TCP_RR
>>>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
>>> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
>>> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
>>> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
>>> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
>>> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
>>> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
>>> 64 50   106966 

Re: [PATCH 00/12] Multiqueue virtio-net

2013-01-14 Thread Anthony Liguori
Jason Wang  writes:

> Hello all:
>
> This seires is an update of last version of multiqueue virtio-net support.
>
> Recently, linux tap gets multiqueue support. This series implements basic
> support for multiqueue tap, nic and vhost. Then use it as an infrastructure to
> enable the multiqueue support for virtio-net.
>
> Both vhost and userspace multiqueue were implemented for virtio-net, but
> userspace could be get much benefits since dataplane like parallized mechanism
> were not implemented.
>
> User could start a multiqueue virtio-net card through adding a "queues"
> parameter to tap.
>
> ./qemu -netdev tap,id=hn0,queues=2,vhost=on -device virtio-net-pci,netdev=hn0
>
> Management tools such as libvirt can pass multiple pre-created fds through
>
> ./qemu -netdev tap,id=hn0,queues=2,fd=X,fd=Y -device
> virtio-net-pci,netdev=hn0

I'm confused/frightened that this syntax works.  You shouldn't be
allowed to have two values for the same property.  Better to have a
syntax like fd[0]=X,fd[1]=Y or something along those lines.

Regards,

Anthony Liguori

>
> You can fetch and try the code from:
> git://github.com/jasowang/qemu.git
>
> Patch 1 adds a generic method of creating multiqueue taps and implement the
> linux part.
> Patch 2 - 4 introduce some helpers which could be used to refactor the nic
> emulation codes to support multiqueue.
> Patch 5 introduces multiqueue support for qemu networking code: each peers of
> NetClientState were abstracted as a queue. Though this, most of the codes 
> could
> be reusued without change.
> Patch 6 adds basic multiqueue support for vhost which could let vhost just
> handle a subset of all virtqueues.
> Patch 7-8 introduce new helpers of virtio which is needed by multiqueue
> virtio-net.
> Patch 9-12 implement the multiqueue support of virtio-net
>
> Changes from RFC v2:
> - rebase the codes to latest qemu
> - align the multiqueue virtio-net implementation to virtio spec
> - split the patches into more smaller patches
> - set_link and hotplug support
>
> Changes from RFC V1:
> - rebase to the latest
> - fix memory leak in parse_netdev
> - fix guest notifiers assignment/de-assignment
> - changes the command lines to:
>qemu -netdev tap,queues=2 -device virtio-net-pci,queues=2
>
> Reference:
> v2: http://lists.gnu.org/archive/html/qemu-devel/2012-06/msg04108.html
> v1: http://comments.gmane.org/gmane.comp.emulators.qemu/100481
>
> Perf Numbers:
>
> Two Intel Xeon 5620 with direct connected intel 82599EB
> Host/Guest kernel: David net tree
> vhost enabled
>
> - lots of improvents of both latency and cpu utilization in request-reponse 
> test
> - get regression of guest sending small packets which because TCP tends to 
> batch
>   less when the latency were improved
>
> 1q/2q/4q
> TCP_RR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 9393.26   595.64  9408.18   597.34  9375.19   584.12
> 1 2072162.1   2214.24 129880.22 2456.13 196949.81 2298.13
> 1 50107513.38 2653.99 139721.93 2490.58 259713.82 2873.57
> 1 100   126734.63 2676.54 145553.5  2406.63 265252.68 2943
> 64 19453.42   632.33  9371.37   616.13  9338.19   615.97
> 64 20   70620.03  2093.68 125155.75 2409.15 191239.91 2253.32
> 64 50   1069662448.29 146518.67 2514.47 242134.07 2720.91
> 64 100  117046.35 2394.56 190153.09 2696.82 238881.29 2704.41
> 256 1   8733.29   736.36  8701.07   680.83  8608.92   530.1
> 256 20  69279.89  2274.45 115103.07 2299.76 144555.16 1963.53
> 256 50  97676.02  2296.09 150719.57 2522.92 254510.5  3028.44
> 256 100 150221.55 2949.56 197569.3  2790.92 300695.78 3494.83
> TCP_CRR
>  size #sessions trans.rate  norm trans.rate  norm trans.rate  norm
> 1 1 2848.37  163.41 2230.39  130.89 2013.09  120.47
> 1 2023434.5  562.11 31057.43 531.07 49488.28 564.41
> 1 5028514.88 582.17 40494.23 605.92 60113.35 654.97
> 1 100   28827.22 584.73 48813.25 661.6  61783.62 676.56
> 64 12780.08  159.4  2201.07  127.96 2006.8   117.63
> 64 20   23318.51 564.47 30982.44 530.24 49734.95 566.13
> 64 50   28585.72 582.54 40576.7  610.08 60167.89 656.56
> 64 100  28747.37 584.17 49081.87 667.87 60612.94 662
> 256 1   2772.08  160.51 2231.84  131.05 2003.62  113.45
> 256 20  23086.35 559.8  30929.09 528.16 48454.9  555.22
> 256 50  28354.7  579.85 40578.31 60760261.71 657.87
> 256 100 28844.55 585.67 48541.86 659.08 61941.07 676.72
> TCP_STREAM guest receiving
>  size #sessions throughput  norm throughput  norm throughput  norm
> 1 1 16.27   1.33   16.11.12   16.13   0.99
> 1 2 33.04   2.08   32.96   2.19   32.75   1.98
> 1 4 66.62   6.83   68.35.56   66.14   2.65
> 64 1896.55  56.67  914.02  58.

Re: [PULL 0/2] vfio-pci: Fixes for 1.4 & stable

2013-01-10 Thread Anthony Liguori
Pulled, thanks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] target-i386: make "enforce" flag work as it should

2013-01-04 Thread Anthony Liguori
Hi,

This is an automated message generated from the QEMU Patches.
Thank you for submitting this patch.  This patch no longer applies to qemu.git.

This may have occurred due to:
 
  1) Changes in mainline requiring your patch to be rebased and re-tested.

  2) Sending the mail using a tool other than git-send-email.  Please use
 git-send-email to send patches to QEMU.

  3) Basing this patch off of a branch that isn't tracking the QEMU
 master branch.  If that was done purposefully, please include the name
 of the tree in the subject line in the future to prevent this message.

 For instance: "[PATCH block-next 1/10] qcow3: add fancy new feature"

  4) You no longer wish for this patch to be applied to QEMU.  No additional
 action is required on your part.

Nacked-by: QEMU Patches 

Below is the output from git-am:

Applying: target-i386: kvm: -cpu host: Use GET_SUPPORTED_CPUID for SVM 
features
Applying: target-i386: kvm: Enable all supported KVM features for -cpu host
Applying: target-i386: check/enforce: Fix CPUID leaf numbers on error 
messages
fatal: sha1 information is lacking or useless (target-i386/cpu.c).
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0003 target-i386: check/enforce: Fix CPUID leaf numbers on 
error messages
The copy of the patch that failed is found in:
   /home/aliguori/.patches/git-working/.git/rebase-apply/patch
When you have resolved this problem run "git am --resolved".
If you would prefer to skip this patch, instead run "git am --skip".
To restore the original branch and stop patching run "git am --abort".

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue

2013-01-02 Thread Anthony Liguori
Gleb Natapov  writes:

> The following changes since commit e376a788ae130454ad5e797f60cb70d0308babb6:
>
>   Merge remote-tracking branch 'kwolf/for-anthony' into staging (2012-12-13 
> 14:32:28 -0600)
>
> are available in the git repository at:
>
>
>   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master
>
> for you to fetch changes up to 0a2a59d35cbabf63c91340a1c62038e3e60538c1:
>
>   qemu-kvm/pci-assign: 64 bits bar emulation (2012-12-25 14:37:52 +0200)
>

Pulled. Thanks.

Regards,

Anthony Liguori

> 
> Will Auld (1):
>   target-i386: Enabling IA32_TSC_ADJUST for QEMU KVM guest VMs
>
> Xudong Hao (1):
>   qemu-kvm/pci-assign: 64 bits bar emulation
>
>  hw/kvm/pci-assign.c   |   14 ++
>  target-i386/cpu.h |2 ++
>  target-i386/kvm.c |   14 ++
>  target-i386/machine.c |   21 +
>  4 files changed, 47 insertions(+), 4 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-12-18

2012-12-18 Thread Anthony Liguori
Juan Quintela  writes:

> Hi
>
> Please send in any agenda topics that you have.

I have a conflicting call today so I can't attend.

Regards,

Anthony Liguori

>
> Thanks, Juan.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2012-12-11

2012-12-11 Thread Anthony Liguori
Kevin Wolf  writes:

> Am 10.12.2012 14:59, schrieb Juan Quintela:
>> 
>> Hi
>> 
>> Please send in any agenda topics you are interested in.
>
> Can probably be answered on the list, but what is the status of
> libqos?

Still on my TODO list.

Regards,

Anthony Liguori

>
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] Alter steal time reporting in KVM

2012-11-28 Thread Anthony Liguori
Glauber Costa  writes:

> Hi,
>
> On 11/27/2012 12:36 AM, Michael Wolf wrote:
>> In the case of where you have a system that is running in a
>> capped or overcommitted environment the user may see steal time
>> being reported in accounting tools such as top or vmstat.  This can
>> cause confusion for the end user.  To ease the confusion this patch set
>> adds the idea of consigned (expected steal) time.  The host will separate
>> the consigned time from the steal time.  The consignment limit passed to the
>> host will be the amount of steal time expected within a fixed period of
>> time.  Any other steal time accruing during that period will show as the
>> traditional steal time.
>
> If you submit this again, please include a version number in your series.
>
> It would also be helpful to include a small changelog about what changed
> between last version and this version, so we could focus on that.
>
> As for the rest, I answered your previous two submissions saying I don't
> agree with the concept. If you hadn't changed anything, resending it
> won't change my mind.
>
> I could of course, be mistaken or misguided. But I had also not seen any
> wave of support in favor of this previously, so basically I have no new
> data to make me believe I should see it any differently.
>
> Let's try this again:
>
> * Rik asked you in your last submission how does ppc handle this. You
> said, and I quote: "In the case of lpar on POWER systems they simply
> report steal time and do not alter it in any way.
> They do however report how much processor is assigned to the partition
> and that information is in /proc/ppc64/lparcfg."

This only is helpful for static entitlements.

But if we allow dynamic entitlements--which is a very useful feature,
think buying an online "upgrade" in a cloud environment--then you need
to account for entitlement loss at the same place where you do the rest
of the accounting: in /proc/stat.

> Now, that is a *way* more sensible thing to do. Much more. "Confusing
> users" is something extremely subjective. This is specially true about
> concepts that are know for quite some time, like steal time. If you out
> of a sudden change the meaning of this, it is sure to confuse a lot more
> users than it would clarify.

I'll bring you a nice bottle of scotch at the next KVM Forum if you can
find me one user that can accurately describe what steal time is.

The semantics are so incredibly subtle that I have a hard time believing
anyone actually understands what it means today.

Regards,

Anthony Liguori
>
>
>
>
>
>> 
>> ---
>> 
>> Michael Wolf (5):
>>   Alter the amount of steal time reported by the guest.
>>   Expand the steal time msr to also contain the consigned time.
>>   Add the code to send the consigned time from the host to the guest
>>   Add a timer to allow the separation of consigned from steal time.
>>   Add an ioctl to communicate the consign limit to the host.
>> 
>> 
>>  arch/x86/include/asm/kvm_host.h   |   11 +++
>>  arch/x86/include/asm/kvm_para.h   |3 +-
>>  arch/x86/include/asm/paravirt.h   |4 +--
>>  arch/x86/include/asm/paravirt_types.h |2 +
>>  arch/x86/kernel/kvm.c |8 ++---
>>  arch/x86/kernel/paravirt.c|4 +--
>>  arch/x86/kvm/x86.c|   50 
>> -
>>  fs/proc/stat.c|9 +-
>>  include/linux/kernel_stat.h   |2 +
>>  include/linux/kvm_host.h  |2 +
>>  include/uapi/linux/kvm.h  |2 +
>>  kernel/sched/core.c   |   10 ++-
>>  kernel/sched/cputime.c|   21 +-
>>  kernel/sched/sched.h  |2 +
>>  virt/kvm/kvm_main.c   |7 +
>>  15 files changed, 120 insertions(+), 17 deletions(-)
>> 
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/1] [PULL] qemu-kvm.git uq/master queue

2012-11-26 Thread Anthony Liguori
Marcelo Tosatti  writes:

> The following changes since commit 1ccbc2851282564308f790753d7158487b6af8e2:
>
>   qemu-sockets: Fix parsing of the inet option 'to'. (2012-11-21 12:07:59 
> +0400)
>
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master
>
> Bruce Rogers (1):
>   Legacy qemu-kvm options have no argument

Pulled. Thanks.

Regards,

Anthony Liguori

>
>  qemu-options.hx |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL 0/3] vfio-pci for 1.3-rc0

2012-11-14 Thread Anthony Liguori
Alex Williamson  writes:

> Hi Anthony,
>
> Please pull the tag below.  I posted the linux-headers update
> separately on Oct-15; since it hasn't been applied and should be
> non-controversial, I include it again here.  Thanks,
>
> Alex
>

Pulled. Thanks.

Regards,

Anthony Liguori

> The following changes since commit f5022a135e4309a54d433c69b2a056756b2d0d6b:
>
>   aio: fix aio_ctx_prepare with idle bottom halves (2012-11-12 20:02:09 +0400)
>
> are available in the git repository at:
>
>   git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-1.3.0-rc0
>
> for you to fetch changes up to a771c51703cf9f91023c6570426258bdf5ec775b:
>
>   vfio-pci: Use common msi_get_message (2012-11-13 12:27:40 -0700)
>
> 
> vfio-pci: KVM INTx accel & common msi_get_message
>
> 
> Alex Williamson (3):
>   linux-headers: Update to 3.7-rc5
>   vfio-pci: Add KVM INTx acceleration
>   vfio-pci: Use common msi_get_message
>
>  hw/vfio_pci.c| 210 
> +++
>  linux-headers/asm-powerpc/kvm_para.h |   6 +-
>  linux-headers/asm-s390/kvm_para.h|   8 +-
>  linux-headers/asm-x86/kvm.h  |  17 +++
>  linux-headers/linux/kvm.h|  25 -
>  linux-headers/linux/kvm_para.h   |   6 +-
>  linux-headers/linux/vfio.h   |   6 +-
>  linux-headers/linux/virtio_config.h  |   6 +-
>  linux-headers/linux/virtio_ring.h|   6 +-
>  9 files changed, 241 insertions(+), 49 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM call agenda for 2012-11-12

2012-11-13 Thread Anthony Liguori
Marcelo Tosatti  writes:

> On Mon, Nov 12, 2012 at 01:58:38PM +0100, Juan Quintela wrote:
>> 
>> Hi
>> 
>> Please send in any agenda topics you are interested in.
>> 
>> Later, Juan.
>
> It would be good to have a status report on qemu-kvm compatibility
> (the remaining TODO items are with Anthony). They are:
>
> - qemu-kvm 1.2 machine type.
> - default accelerator being KVM.
>
> Note migration will remain broken due to 
>
> https://patchwork.kernel.org/patch/1674521/
>
> BTW, this can be via email, if preferred (i cannot attend the call).

Let's cancel the call and I'll spend the hour writing up the patches and
sending them out.

Regards,

Anthony Liguori

>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 1.1.1 -> 1.1.2 migrate /managedsave issue

2012-11-04 Thread Anthony Liguori
Avi Kivity  writes:

> On 10/22/2012 09:04 AM, Philipp Hahn wrote:
>> Hello Doug,
>> 
>> On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote:
>>> I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs.
>> ...
>>> I had upgraded to qemu-kvm 1.1.2
>> ... 
>>> qemu: warning: error while loading state for instance 0x0 of device 'ram'
>>> load of migration failed
>> 
>> That error can be from many things. For me it was that the PXE-ROM images 
>> for 
>> the network cards were updated as well. Their size changed over the next 
>> power-of-two size, so kvm needed to allocate less/more memory and changed 
>> some PCI configuration registers, where the size of the ROM region is stored.
>> On loading the saved state those sizes were compared and failed to validate. 
>> KVM then aborts loading the saved state with that little helpful message.
>> 
>> So you might want to check, if your case is similar to mine.
>> 
>> I diagnosed that using gdb to single step kvm until I found 
>> hw/pci.c#get_pci_config_device() returning -EINVAL.
>> 
>
> Seems reasonable.  Doug, please verify to see if it's the same issue or
> another one.
>
> Juan, how can we fix this?  It's clear that the option ROM size has to
> be fixed and not change whenever the blob is updated.  This will fix it
> for future releases.  But what to do about the ones in the field?

This is not a problem upstream because we don't alter the ROMs.  If we
did, we would keep the old ROMs around and set the romfile property in
the compatible machine.

This is what distros that are shipping ROMs outside of QEMU ought to
do.  It's a bug to unconditionally change the ROMs (in a guest visible
way) without adding compatibility support.

Regards,

Anthony Liguori

>
> -- 
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/28] [PULL] qemu-kvm.git uq/master queue

2012-11-01 Thread Anthony Liguori
Marcelo Tosatti  writes:

> The following changes since commit aee0bf7d8d7564f8f2c40e4501695c492b7dd8d1:
>
>   tap-win32: stubs to fix win32 build (2012-10-30 19:18:53 +)
>
> are available in the git repository at:
>   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master
>
> Don Slutz (1):
>   target-i386: Add missing kvm cpuid feature name
>
> Eduardo Habkost (19):
>   i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of for 
> loop
>   i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features check
>   i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
>   i386: kvm: extract register switch to cpuid_entry_get_reg() function
>   i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
>   i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid() 
> function
>   i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with single 
> 'if'
>   i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
>   i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on 
> kvm_arch_get_supported_cpuid()
>   i386: kvm: x2apic is not supported without in-kernel irqchip
>   i386: kvm: mask cpuid_kvm_features earlier
>   i386: kvm: mask cpuid_ext4_features bits earlier
>   i386: kvm: filter CPUID feature words earlier, on cpu.c
>   i386: kvm: reformat filter_features_for_kvm() code
>   i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too
>   i386: cpu: add missing CPUID[EAX=7,ECX=0] flag names
>   target-i386: make cpu_x86_fill_host() void
>   target-i386: cpu: make -cpu host/check/enforce code KVM-specific
>   target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID
>
> Jan Kiszka (6):
>   Use machine options to emulate -no-kvm-irqchip
>   Issue warning when deprecated -no-kvm-pit is used
>   Use global properties to emulate -no-kvm-pit-reinjection
>   Issue warning when deprecated drive parameter boot=on|off is used
>   Issue warning when deprecated -tdf option is used
>   Emulate qemu-kvms -no-kvm option
>
> Marcelo Tosatti (1):
>   cirrus_vga: allow configurable vram size
>
> Peter Maydell (1):
>   update-linux-headers.sh: Handle new kernel uapi/ directories
>

Pulled. Thanks.

Regards,

Anthony Liguori

>  blockdev.c  |6 ++
>  hw/cirrus_vga.c |   21 --
>  kvm.h   |1 +
>  qemu-config.c   |4 +
>  qemu-options.hx |   16 
>  scripts/update-linux-headers.sh |3 +-
>  target-i386/cpu.c   |   98 +++---
>  target-i386/kvm.c   |  153 
> +++
>  vl.c|   33 +
>  9 files changed, 242 insertions(+), 93 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-16 Thread Anthony Liguori
 could then detect that it is running
>> on an old kernel and fall back to the old format.
>> 
>> The HPT entry format is very unlikely to change in size or basic
>> layout (though the architects do redefine some of the bits
>> occasionally).
>
> I meant the internal data structure that holds HPT entries.
>
> I guess I don't understand the index.  Do we expect changes to be in
> contiguous ranges?  And invalid entries to be contiguous as well?  That
> doesn't fit with how hash tables work.  Does the index represent the
> position of the entry within the table, or something else?
>
>
>> 
>>> > +
>>> > +Writes to the fd create HPT entries starting at the index given in the
>>> > +header; first `n_valid' valid entries with contents from the data
>>> > +written, then `n_invalid' invalid entries, invalidating any previously
>>> > +valid entries found.
>>> 
>>> This scheme is a clever, original, and very interesting approach to live
>>> migration.  That doesn't necessarily mean a NAK, we should see if it
>>> makes sense for other migration APIs as well (we currently have
>>> difficulties migrating very large/wide guests).
>>> 
>>> What is the typical number of entries in the HPT?  Do you have estimates
>>> of the change rate?
>> 
>> Typically the HPT would have about a million entries, i.e. it would be
>> 16MiB in size.  The usual guideline is to make it about 1/64 of the
>> maximum amount of RAM the guest could ever have, rounded up to a power
>> of two, although we often run with less, say 1/128 or even 1/256.
>
> 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
> it warrant a live migration protocol?

0.15 sec == 150ms.  The typical downtime window is 30ms.  So yeah, I
think it does.

>> Because it is a hash table, updates tend to be scattered throughout
>> the whole table, which is another reason why per-page dirty tracking
>> and updates would be pretty inefficient.
>
> This suggests a stream format that includes the index in every entry.
>
>> 
>> As for the change rate, it depends on the application of course, but
>> basically every time the guest changes a PTE in its Linux page tables
>> we do the corresponding change to the corresponding HPT entry, so the
>> rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
>> exec, etc. have a high rate of HPT updates.
>
> If the rate is high enough, then there's no point in a live update.

Do we have practical data here?

Regards,

Anthony Liguori

>
>> 
>>> Suppose new hardware arrives that supports nesting HPTs, so that kvm is
>>> no longer synchronously aware of the guest HPT (similar to how NPT/EPT
>>> made kvm unaware of guest virtual->physical translations on x86).  How
>>> will we deal with that?  But I guess this will be a
>>> non-guest-transparent and non-userspace-transparent change, unlike
>>> NPT/EPT, so a userspace ABI addition will be needed anyway).
>> 
>> Nested HPTs or other changes to the MMU architecture would certainly
>> need new guest kernels and new support in KVM.  With a nested
>> approach, the guest-side MMU data structures (HPT or whatever) would
>> presumably be in guest memory and thus be handled along with all the
>> other guest memory, while the host-side MMU data structures would not
>> need to be saved, so from the migration point of view that would make
>> it all a lot simpler.
>
> Yeah.
>
>
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu: Update Linux headers

2012-10-15 Thread Anthony Liguori
Alex Williamson  writes:

> Based on v3.7-rc1-3-g29bb4cc

Normally this would go through qemu-kvm/uq/master but since this is from
Linus' tree, it's less of a concern.

Nonetheless, I'd prefer we did it from v3.7-rc1 instead of a random git
snapshot.

Regards,

Anthony Liguori

>
> Signed-off-by: Alex Williamson 
> ---
>
>  Trying to get KVM_IRQFD_FLAG_RESAMPLE and friends for vfio-pci
>
>  linux-headers/asm-x86/kvm.h |   17 +
>  linux-headers/linux/kvm.h   |   25 +
>  linux-headers/linux/kvm_para.h  |6 +++---
>  linux-headers/linux/vfio.h  |6 +++---
>  linux-headers/linux/virtio_config.h |6 +++---
>  linux-headers/linux/virtio_ring.h   |6 +++---
>  6 files changed, 50 insertions(+), 16 deletions(-)
>
> diff --git a/linux-headers/asm-x86/kvm.h b/linux-headers/asm-x86/kvm.h
> index 246617e..a65ec29 100644
> --- a/linux-headers/asm-x86/kvm.h
> +++ b/linux-headers/asm-x86/kvm.h
> @@ -9,6 +9,22 @@
>  #include 
>  #include 
>  
> +#define DE_VECTOR 0
> +#define DB_VECTOR 1
> +#define BP_VECTOR 3
> +#define OF_VECTOR 4
> +#define BR_VECTOR 5
> +#define UD_VECTOR 6
> +#define NM_VECTOR 7
> +#define DF_VECTOR 8
> +#define TS_VECTOR 10
> +#define NP_VECTOR 11
> +#define SS_VECTOR 12
> +#define GP_VECTOR 13
> +#define PF_VECTOR 14
> +#define MF_VECTOR 16
> +#define MC_VECTOR 18
> +
>  /* Select x86 specific features in  */
>  #define __KVM_HAVE_PIT
>  #define __KVM_HAVE_IOAPIC
> @@ -25,6 +41,7 @@
>  #define __KVM_HAVE_DEBUGREGS
>  #define __KVM_HAVE_XSAVE
>  #define __KVM_HAVE_XCRS
> +#define __KVM_HAVE_READONLY_MEM
>  
>  /* Architectural interrupt line count. */
>  #define KVM_NR_INTERRUPTS 256
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index 4b9e575..81d2feb 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -101,9 +101,13 @@ struct kvm_userspace_memory_region {
>   __u64 userspace_addr; /* start of the userspace allocated memory */
>  };
>  
> -/* for kvm_memory_region::flags */
> -#define KVM_MEM_LOG_DIRTY_PAGES  1UL
> -#define KVM_MEMSLOT_INVALID  (1UL << 1)
> +/*
> + * The bit 0 ~ bit 15 of kvm_memory_region::flags are visible for userspace,
> + * other bits are reserved for kvm internal use which are defined in
> + * include/linux/kvm_host.h.
> + */
> +#define KVM_MEM_LOG_DIRTY_PAGES  (1UL << 0)
> +#define KVM_MEM_READONLY (1UL << 1)
>  
>  /* for KVM_IRQ_LINE */
>  struct kvm_irq_level {
> @@ -618,6 +622,10 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_GET_SMMU_INFO 78
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#ifdef __KVM_HAVE_READONLY_MEM
> +#define KVM_CAP_READONLY_MEM 81
> +#endif
> +#define KVM_CAP_IRQFD_RESAMPLE 82
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -683,12 +691,21 @@ struct kvm_xen_hvm_config {
>  #endif
>  
>  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/*
> + * Available with KVM_CAP_IRQFD_RESAMPLE
> + *
> + * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
> + * the irqfd to operate in resampling mode for level triggered interrupt
> + * emlation.  See Documentation/virtual/kvm/api.txt.
> + */
> +#define KVM_IRQFD_FLAG_RESAMPLE (1 << 1)
>  
>  struct kvm_irqfd {
>   __u32 fd;
>   __u32 gsi;
>   __u32 flags;
> - __u8  pad[20];
> + __u32 resamplefd;
> + __u8  pad[16];
>  };
>  
>  struct kvm_clock_data {
> diff --git a/linux-headers/linux/kvm_para.h b/linux-headers/linux/kvm_para.h
> index 7bdcf93..cea2c5c 100644
> --- a/linux-headers/linux/kvm_para.h
> +++ b/linux-headers/linux/kvm_para.h
> @@ -1,5 +1,5 @@
> -#ifndef __LINUX_KVM_PARA_H
> -#define __LINUX_KVM_PARA_H
> +#ifndef _UAPI__LINUX_KVM_PARA_H
> +#define _UAPI__LINUX_KVM_PARA_H
>  
>  /*
>   * This header file provides a method for making a hypercall to the host
> @@ -25,4 +25,4 @@
>   */
>  #include 
>  
> -#endif /* __LINUX_KVM_PARA_H */
> +#endif /* _UAPI__LINUX_KVM_PARA_H */
> diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
> index f787b72..4758d1b 100644
> --- a/linux-headers/linux/vfio.h
> +++ b/linux-headers/linux/vfio.h
> @@ -8,8 +8,8 @@
>   * it under the terms of the GNU General Public License version 2 as
>   * published by the Free Software Foundation.
>   */
> -#ifndef VFIO_H
> -#define VFIO_H
> +#ifndef _UAPIVFIO_H
> +#define _UAPIVFIO_H
>  
>  #include 
>  #include 
> @@ -365,4 +365,4 @@ struct vfio_iommu_type1_dma_unmap {
>  
>  #define VFIO_IOMMU_UNMAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 14)
>  
> -#endi

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Anthony Liguori
Rusty Russell  writes:

> Gerd Hoffmann  writes:
>> So how about this:
>>
>> (1) Add a vendor specific pci capability for new-style virtio.
>> Specifies the pci bar used for new-style virtio registers.
>> Guests can use it to figure whenever new-style virtio is
>> supported and to map the correct bar (which will probably
>> be bar 1 in most cases).
>
> This was closer to the original proposal[1], which I really liked (you
> can layout bars however you want).  Anthony thought that vendor
> capabilities were a PCI-e feature, but it seems they're blessed in PCI
> 2.3.

2.3 was standardized in 2002.  Are we confident that vendor extensions
play nice with pre-2.3 OSes like Win2k, WinXP, etc?

I still think it's a bad idea to rely on something so "new" in something
as fundamental as virtio-pci unless we have to.

Regards,

Anthony Liguori

>
> So let's return to that proposal, giving something like this:
>
> /* IDs for different capabilities.  Must all exist. */
> /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
> /* Common configuration */
> #define VIRTIO_PCI_CAP_COMMON_CFG 1
> /* Notifications */
> #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
> /* ISR access */
> #define VIRTIO_PCI_CAP_ISR_CFG3
> /* Device specific confiuration */
> #define VIRTIO_PCI_CAP_DEVICE_CFG 4
>
> /* This is the PCI capability header: */
> struct virtio_pci_cap {
>   u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
>   u8 cap_next;/* Generic PCI field: next ptr. */
>   u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
>   u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
>   u8 bar; /* Where to find it. */
>   u8 unused;
>   __le16 offset;  /* Offset within bar. */
>   __le32 length;  /* Length. */
> };
>
> This means qemu can point the isr_cfg into the legacy area if it wants.
> In fact, it can put everything in BAR0 if it wants.
>
> Thoughts?
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>>> Well, we also want to clean up the registers, so how about:
>>>
>>> BAR0: legacy, as is.  If you access this, don't use the others.
>
> Ok.
>
>>> BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
>>> BAR2: virtio-cfg.  If you use this, don't use BAR0.
>
> Why use two bars for this?  You can put them into one mmio bar, together
> with the msi-x vector table and PBA.  Of course a pci capability
> describing the location is helpful for that ;)

You don't need a capability.  You can also just add a "config offset"
field to the register set and then make the semantics that it occurs in
the same region.

>
>>> BAR3: ISR. If you use this, don't use BAR0.
>
> Again, I wouldn't hardcode that but use a capability.
>
>>> I prefer the cases exclusive (ie. use one or the other) as a clear path
>>> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
>>> an ugly corner case in future (ISR is BAR0 + 19?  WTF?).
>
> Ok, so we have four register sets:
>
>   (1) legacy layout
>   (2) new virtio-pci
>   (3) new virtio-config
>   (4) new virtio-isr
>
> We can have a vendor pci capability, with a dword for each register set:
>
>   bit  31-- present bit
>   bits 26-24 -- bar
>   bits 23-0  -- offset
>
> So current drivers which must support legacy can use this:
>
>   legacy layout -- present, bar 0, offset 0
>   new virtio-pci-- present, bar 1, offset 0
>   new virtio-config -- present, bar 1, offset 256
>   new virtio-isr-- present, bar 0, offset 19
>
> [ For completeness: msi-x capability could add this: ]
>
>   msi-x vector tablebar 1, offset 512
>   msi-x pba bar 1, offset 768
>
>> We'll never remove legacy so we shouldn't plan on it.  There are
>> literally hundreds of thousands of VMs out there with the current virtio
>> drivers installed in them.  We'll be supporting them for a very, very
>> long time :-)
>
> But new devices (virtio-qxl being a candidate) don't have old guests and
> don't need to worry.
>
> They could use this if they care about fast isr:
>
>   legacy layout -- not present
>   new virtio-pci-- present, bar 1, offset 0
>   new virtio-config -- present, bar 1, offset 256
>   new virtio-isr-- present, bar 0, offset 0
>
> Or this if they don't worry about isr performance:
>
>   legacy layout -- not present
>   new virtio-pci-- present, bar 0, offset 0
>   new virtio-config -- present, bar 0, offset 256
>   new virtio-isr-- not present
>
>> I don't think we gain a lot by moving the ISR into a separate BAR.
>> Splitting up registers like that seems weird to me too.
>
> Main advantage of defining a register set with just isr is that it
> reduces pio address space consumtion for new virtio devices which don't
> have to worry about the legacy layout (8 bytes which is minimum size for
> io bars instead of 64 bytes).

Doing some rough math, we should have at least 16k of PIO space.  That
let's us have well over 500 virtio-pci devices with the current register
layout.

I don't think we're at risk of running out of space...

>> If we added an additional constraints that BAR1 was mirrored except for
>
> Why add constraints?  We want something future-proof, don't we?
>
>>> The detection is simple: if BAR1 has non-zero length, it's new-style,
>>> otherwise legacy.
>
> Doesn't fly.  BAR1 is in use today for MSI-X support.

But the location is specified via capabilities so we can change the
location to be within BAR1 at a non-conflicting offset.

>> I agree that this is the best way to extend, but I think we should still
>> use a transport feature bit.  We want to be able to detect within QEMU
>> whether a guest is using these new features because we need to adjust
>> migration state accordingly.
>
> Why does migration need adjustments?

Because there is additional state in the "new" layout.  We need to
understand whether a guest is relying on that state or not.

For instance, extended virtio features.  If a guest is in the process
of reading extended virtio features, it may not have changed any state
but we must ensure that we don't migrate to an older verison of QEMU w/o
the extended virtio features.

This cannot be handled by subsections today because there is no guest
written state that's affected.

Regards,

Anthony Liguori

>
> [ Not that I want veto a feature bit, but I don't see the need yet ]
>
> cheers,
>   Gerd
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Avi Kivity  writes:

> On 10/09/2012 05:16 AM, Rusty Russell wrote:
>> Anthony Liguori  writes:
>>> We'll never remove legacy so we shouldn't plan on it.  There are
>>> literally hundreds of thousands of VMs out there with the current virtio
>>> drivers installed in them.  We'll be supporting them for a very, very
>>> long time :-)
>> 
>> You will be supporting this for qemu on x86, sure.  As I think we're
>> still in the growth phase for virtio, I prioritize future spec
>> cleanliness pretty high.
>
> If a pure ppc hypervisor was on the table, this might have been
> worthwhile.  As it is the codebase is shared, and the Linux drivers are
> shared, so cleaning up the spec doesn't help the code.

Note that distros have been (perhaps unknowingly) shipping virtio-pci
for PPC for some time now.

So even though there wasn't a hypervisor that supported virtio-pci, the
guests already support it and are out there in the wild.

There's a lot of value in maintaining "legacy" support even for PPC.
 
>> But I think you'll be surprised how fast this is deprecated:
>> 1) Bigger queues for block devices (guest-specified ringsize)
>> 2) Smaller rings for openbios (guest-specified alignment)
>> 3) All-mmio mode (powerpc)
>> 4) Whatever network features get numbers > 31.
>> 
>>> I don't think we gain a lot by moving the ISR into a separate BAR.
>>> Splitting up registers like that seems weird to me too.
>> 
>> Confused.  I proposed the same split as you have, just ISR by itself.
>
> I believe Anthony objects to having the ISR by itself.  What is the
> motivation for that?

Right, BARs are a precious resource not to be spent lightly.  Having an
entire BAR dedicated to a 1-byte register seems like a waste to me.

Regards,

Anthony Liguori

>
>
> -- 
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> We'll never remove legacy so we shouldn't plan on it.  There are
>> literally hundreds of thousands of VMs out there with the current virtio
>> drivers installed in them.  We'll be supporting them for a very, very
>> long time :-)
>
> You will be supporting this for qemu on x86, sure.

And PPC.

> As I think we're
> still in the growth phase for virtio, I prioritize future spec
> cleanliness pretty high.
>
> But I think you'll be surprised how fast this is deprecated:
> 1) Bigger queues for block devices (guest-specified ringsize)
> 2) Smaller rings for openbios (guest-specified alignment)
> 3) All-mmio mode (powerpc)
> 4) Whatever network features get numbers > 31.

We can do all of these things with incremental change to the existing
layout.  That's the only way what I'm suggesting is different.

You want to reorder all of the fields and have a driver flag day.  But I
strongly suspect we'll decide we need to do the same exercise again in 4
years when we now need to figure out how to take advantage of
transactional memory or some other whiz-bang hardware feature.

There are a finite number of BARs but each BAR has an almost infinite
size.  So extending BARs instead of introducing new one seems like the
conservative approach moving forward.

>> I don't think we gain a lot by moving the ISR into a separate BAR.
>> Splitting up registers like that seems weird to me too.
>
> Confused.  I proposed the same split as you have, just ISR by itself.

I disagree with moving the ISR into a separate BAR.  That's what seems
weird to me.

>> It's very normal to have a mirrored set of registers that are PIO in one
>> bar and MMIO in a different BAR.
>>
>> If we added an additional constraints that BAR1 was mirrored except for
>> the config space and the MSI section was always there, I think the end
>> result would be nice.  IOW:
>
> But it won't be the same, because we want all that extra stuff, like
> more feature bits and queue size alignment.  (Admittedly queues past
> 16TB aren't a killer feature).
>
> To make it concrete:
>
> Current:
> struct {
> __le32 host_features;   /* read-only */
> __le32 guest_features;  /* read/write */
> __le32 queue_pfn;   /* read/write */
> __le16 queue_size;  /* read-only */
> __le16 queue_sel;   /* read/write */
> __le16 queue_notify;/* read/write */
> u8 status;  /* read/write */
> u8 isr; /* read-only, clear on read */
> /* Optional */
> __le16 msi_config_vector;   /* read/write */
> __le16 msi_queue_vector;/* read/write */
> /* ... device features */
> };
>
> Proposed:
> struct virtio_pci_cfg {
>   /* About the whole device. */
>   __le32 device_feature_select;   /* read-write */
>   __le32 device_feature;  /* read-only */
>   __le32 guest_feature_select;/* read-write */
>   __le32 guest_feature;   /* read-only */
>   __le16 msix_config; /* read-write */
>   __u8 device_status; /* read-write */
>   __u8 unused;
>
>   /* About a specific virtqueue. */
>   __le16 queue_select;/* read-write */
>   __le16 queue_align; /* read-write, power of 2. */
>   __le16 queue_size;  /* read-write, power of 2. */
>   __le16 queue_msix_vector;/* read-write */
>   __le64 queue_address;   /* read-write: 0x == DNE. */
> };
>
> struct virtio_pci_isr {
> __u8 isr; /* read-only, clear on read */
> };

What I'm suggesting is:

> struct {
> __le32 host_features;   /* read-only */
> __le32 guest_features;  /* read/write */
> __le32 queue_pfn;   /* read/write */
> __le16 queue_size;  /* read-only */
> __le16 queue_sel;   /* read/write */
> __le16 queue_notify;/* read/write */
> u8 status;  /* read/write */
> u8 isr; /* read-only, clear on read */
> __le16 msi_config_vector;   /* read/write */
> __le16 msi_queue_vector;/* read/write */
> __le32 host_feature_select; /* read/write */
> __le32 guest_feature_select;/* read/write */
> __le32 queue_pfn_hi;/* read/write */
> };
>

With the additional semantic that the virtio-config space is overlayed
on top of the register set in BAR0 unless the
VIRTIO_PCI_F_SEPARATE_CONFIG feature is acknowledged.  This feature
acts as a latch and when set, removes the config space overlay.

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> Gerd Hoffmann  writes:
>>
>>>   Hi,
>>>
>>>>> So we could have for virtio something like this:
>>>>>
>>>>> Capabilities: [??] virtio-regs:
>>>>> legacy: BAR=0 offset=0
>>>>> virtio-pci: BAR=1 offset=1000
>>>>> virtio-cfg: BAR=1 offset=1800
>>>> 
>>>> This would be a vendor specific PCI capability so lspci wouldn't
>>>> automatically know how to parse it.
>>>
>>> Sure, would need a patch to actually parse+print the cap,
>>> /me was just trying to make my point clear in a simple way.
>>>
>>>>>>> 2) ISTR an argument about mapping the ISR register separately, for
>>>>>>>performance, but I can't find a reference to it.
>>>>>>
>>>>>> I think the rationale is that ISR really needs to be PIO but everything
>>>>>> else doesn't.  PIO is much faster on x86 because it doesn't require
>>>>>> walking page tables or instruction emulation to handle the exit.
>>>>>
>>>>> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
>>>>> correct?  Which would imply that pretty much only old guests without
>>>>> MSI-X support need this, and we don't need to worry that much when
>>>>> designing something new ...
>>>> 
>>>> It wasn't that long ago that MSI-X wasn't supported..  I think we should
>>>> continue to keep ISR as PIO as it is a fast path.
>>>
>>> No problem if we allow to have both legacy layout and new layout at the
>>> same time.  Guests can continue to use ISR @ BAR0 in PIO space for
>>> existing virtio devices, even in case they want use mmio for other
>>> registers -> all fine.
>>>
>>> New virtio devices can support MSI-X from day one and decide to not
>>> expose a legacy layout PIO bar.
>>
>> I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
>> virtio configuration space is probably not that bad of a solution.
>
> Well, we also want to clean up the registers, so how about:
>
> BAR0: legacy, as is.  If you access this, don't use the others.
> BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
> BAR2: virtio-cfg.  If you use this, don't use BAR0.
> BAR3: ISR. If you use this, don't use BAR0.
>
> I prefer the cases exclusive (ie. use one or the other) as a clear path
> to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
> an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

We'll never remove legacy so we shouldn't plan on it.  There are
literally hundreds of thousands of VMs out there with the current virtio
drivers installed in them.  We'll be supporting them for a very, very
long time :-)

I don't think we gain a lot by moving the ISR into a separate BAR.
Splitting up registers like that seems weird to me too.

It's very normal to have a mirrored set of registers that are PIO in one
bar and MMIO in a different BAR.

If we added an additional constraints that BAR1 was mirrored except for
the config space and the MSI section was always there, I think the end
result would be nice.  IOW:

BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
BAR1[mmio]: virtio-pci registers + MSI section + future extensions
BAR2[mmio]: virtio-config

We can continue to do ISR access via BAR0 for performance reasons.

> As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
> endorse that and leave it to the devices.
>
> The detection is simple: if BAR1 has non-zero length, it's new-style,
> otherwise legacy.

I agree that this is the best way to extend, but I think we should still
use a transport feature bit.  We want to be able to detect within QEMU
whether a guest is using these new features because we need to adjust
migration state accordingly.

Otherwise we would have to detect reads/writes to the new BARs to
maintain whether the extended register state needs to be saved.  This
gets nasty dealing with things like reset.

A feature bit simplifies this all pretty well.

Regards,

Anthony Liguori

>
> Thoughts?
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>>> So we could have for virtio something like this:
>>>
>>> Capabilities: [??] virtio-regs:
>>> legacy: BAR=0 offset=0
>>> virtio-pci: BAR=1 offset=1000
>>> virtio-cfg: BAR=1 offset=1800
>> 
>> This would be a vendor specific PCI capability so lspci wouldn't
>> automatically know how to parse it.
>
> Sure, would need a patch to actually parse+print the cap,
> /me was just trying to make my point clear in a simple way.
>
>>>>> 2) ISTR an argument about mapping the ISR register separately, for
>>>>>performance, but I can't find a reference to it.
>>>>
>>>> I think the rationale is that ISR really needs to be PIO but everything
>>>> else doesn't.  PIO is much faster on x86 because it doesn't require
>>>> walking page tables or instruction emulation to handle the exit.
>>>
>>> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
>>> correct?  Which would imply that pretty much only old guests without
>>> MSI-X support need this, and we don't need to worry that much when
>>> designing something new ...
>> 
>> It wasn't that long ago that MSI-X wasn't supported..  I think we should
>> continue to keep ISR as PIO as it is a fast path.
>
> No problem if we allow to have both legacy layout and new layout at the
> same time.  Guests can continue to use ISR @ BAR0 in PIO space for
> existing virtio devices, even in case they want use mmio for other
> registers -> all fine.
>
> New virtio devices can support MSI-X from day one and decide to not
> expose a legacy layout PIO bar.

I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
virtio configuration space is probably not that bad of a solution.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Gerd Hoffmann  writes:

>   Hi,
>
>> But I think we could solve this in a different way.  I think we could
>> just move the virtio configuration space to BAR1 by using a transport
>> feature bit.
>
> Why hard-code stuff?
>
> I think it makes alot of sense to have a capability simliar to msi-x
> which simply specifies bar and offset of the register sets:
>
> [root@fedora ~]# lspci -vvs4
> 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
> [ ... ]
>   Region 0: I/O ports at c000 [size=64]
>   Region 1: Memory at fc029000 (32-bit) [size=4K]
>   Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
>   Vector table: BAR=1 offset=
>   PBA: BAR=1 offset=0800

MSI-X capability is a standard PCI capability which is why lspci can
parse it.

>
> So we could have for virtio something like this:
>
> Capabilities: [??] virtio-regs:
> legacy: BAR=0 offset=0
> virtio-pci: BAR=1 offset=1000
> virtio-cfg: BAR=1 offset=1800

This would be a vendor specific PCI capability so lspci wouldn't
automatically know how to parse it.

You could just as well teach lspci to parse BAR0 to figure out what
features are supported.

>> That then frees up the entire BAR0 for use as virtio-pci registers.  We
>> can then always include the virtio-pci MSI-X register space and
>> introduce all new virtio-pci registers as simply being appended.
>
> BAR0 needs to stay as-is for compatibility reasons.  New devices which
> don't have to care about old guests don't need to provide a 'legacy'
> register region.

A latch feature bit would allow the format to change without impacting
compatibility at all.

>>> 2) ISTR an argument about mapping the ISR register separately, for
>>>performance, but I can't find a reference to it.
>> 
>> I think the rationale is that ISR really needs to be PIO but everything
>> else doesn't.  PIO is much faster on x86 because it doesn't require
>> walking page tables or instruction emulation to handle the exit.
>
> Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
> correct?  Which would imply that pretty much only old guests without
> MSI-X support need this, and we don't need to worry that much when
> designing something new ...

It wasn't that long ago that MSI-X wasn't supported..  I think we should
continue to keep ISR as PIO as it is a fast path.

Regards,

Anthony Liguori

>
> cheers,
>   Gerd
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori
Rusty Russell  writes:

> (Topic updated, cc's trimmed).
>
> Anthony Liguori  writes:
>> Rusty Russell  writes:
>>> 4) The only significant change to the spec is that we use PCI
>>>capabilities, so we can have infinite feature bits.
>>>(see 
>>> http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)
>>
>> We discussed this on IRC last night.  I don't think PCI capabilites are
>> a good mechanism to use...
>>
>> PCI capabilities are there to organize how the PCI config space is
>> allocated to allow vendor extensions to co-exist with future PCI
>> extensions.
>>
>> But we've never used the PCI config space within virtio-pci.  We do
>> everything in BAR0.  I don't think there's any real advantage of using
>> the config space vs. a BAR for virtio-pci.
>
> Note before anyone gets confused; we were talking about using the PCI
> config space to indicate what BAR(s) the virtio stuff is in.  An
> alternative would be to simply specify a new layout format in BAR1.
>
> The arguments for a more flexible format that I know of:
>
> 1) virtio-pci has already extended the pci-specific part of the
>configuration once (for MSI-X), so I don't want to assume it won't
>happen again.

"configuration" is the wrong word here.

The virtio-pci BAR0 layout is:

   0..19   virtio-pci registers
   20+ virtio configuration space

MSI-X needed to add additional virtio-pci registers, so now we have:

   0..19   virtio-pci registers

if MSI-X:
   20..23  virtio-pci MSI-X registers
   24+ virtio configuration space
else:
   20+ virtio configuration space

I agree, this stinks.

But I think we could solve this in a different way.  I think we could
just move the virtio configuration space to BAR1 by using a transport
feature bit.

That then frees up the entire BAR0 for use as virtio-pci registers.  We
can then always include the virtio-pci MSI-X register space and
introduce all new virtio-pci registers as simply being appended.

This new feature bit then becomes essentially a virtio configuration
latch.  When unacked, virtio configuration hides new registers, when
acked, those new registers are exposed.

Another option is to simply put new registers after the virtio
configuration blob.

> 2) ISTR an argument about mapping the ISR register separately, for
>performance, but I can't find a reference to it.

I think the rationale is that ISR really needs to be PIO but everything
else doesn't.  PIO is much faster on x86 because it doesn't require
walking page tables or instruction emulation to handle the exit.

The argument to move the remaining registers to MMIO is to allow 64-bit
accesses to registers which isn't possible with PIO.

>> This maps really nicely to non-PCI transports too.
>
> This isn't right.  Noone else can use the PCI layout.  While parts are
> common, other parts are pci-specific (MSI-X and ISR for example), and
> yet other parts are specified by PCI elsewhere (eg interrupt numbers).
>
>> But extending the
>> PCI config space (especially dealing with capability allocation) is
>> pretty gnarly and there isn't an obvious equivalent outside of PCI.
>
> That's OK, because general changes should be done with feature bits, and
> the others all have an infinite number.  Being the first, virtio-pci has
> some unique limitations we'd like to fix.
>
>> There are very devices that we emulate today that make use of extended
>> PCI device registers outside the platform devices (that have no BARs).
>
> This sentence confused me?

There is a missing "few".  "There are very few devices..."

Extending the PCI configuration space is unusual for PCI devices.  That
was the point.

Regards,

Anthony Liguori

>
> Thanks,
> Rusty.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >