Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Paolo Bonzini
Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
>> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
>> > IOAPICs/source controllers (Paul's proposal at
>> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
>> > for example), assign chip ids to them, set the number of input lines,
>> > etc. but the configuration should work well with the existing ioctls,
>> > with no limit on the number of sources. 
> But what do you mean by "configuration" really ? I don't see anything in
> common there.

Wiring which MSI-X interrupts go to which source controllers.  If you
have one source controller per PCI bridge, you need to tell the kernel
the mapping between MSI messages interrupts and PCI bridges, and update
it whenever the MSI masking changes.

The other problem is configuring the redirection table.  If you need >64
sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Peter Maydell
On 26 October 2012 10:58, Paolo Bonzini  wrote:
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.
>
> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
ioctls have plenty of space in the ID range to allow you to devote
a subsection of it to your irqchip. (This is exactly how the ARM
VGIC save/load is going to work.)

Whether you want to do startup configuration and board wiring via
the same ioctl that handles runtime state save/load/migration is
a different question, of course.

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Paolo Bonzini
Il 26/10/2012 12:09, Peter Maydell ha scritto:
>> >
>> > The other problem is configuring the redirection table.  If you need >64
>> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
> ioctls have plenty of space in the ID range to allow you to devote
> a subsection of it to your irqchip. (This is exactly how the ARM
> VGIC save/load is going to work.)

Ok, I stand corrected. :)

> Whether you want to do startup configuration and board wiring via
> the same ioctl that handles runtime state save/load/migration is
> a different question, of course.

QEMU's MSI-X routing is not x86-specific, so it should use the same
KVM_SET_GSI_ROUTING ioctl that x86 uses.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Jan Kiszka
On 2012-10-26 12:15, Paolo Bonzini wrote:
> Il 26/10/2012 12:09, Peter Maydell ha scritto:

 The other problem is configuring the redirection table.  If you need >64
 sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
>> Why would you want an extra ONE_REG-like ioctl? The existing ONE_REG
>> ioctls have plenty of space in the ID range to allow you to devote
>> a subsection of it to your irqchip. (This is exactly how the ARM
>> VGIC save/load is going to work.)
> 
> Ok, I stand corrected. :)
> 
>> Whether you want to do startup configuration and board wiring via
>> the same ioctl that handles runtime state save/load/migration is
>> a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses.

And it's not only MSI[-X]. Most IRQ sources need to be rounted, either
from userspace or from irqfd or from some other in-kernel source to a
specific IRQ controller. That allows to customize things according to a
specific board / SoC emulation.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/5] Qemu: do not mark bios readonly

2012-10-26 Thread Jan Kiszka
On 2012-10-25 11:22, Xiao Guangrong wrote:
> In isapc, no i440x device exists in guest that means seabios can not
> make 0xc to 0x100 writable
> 
> It works fine in current code since the guest can happily write readonly
> memory. In order to support readonly slot in Qemu, we do not make the bios
> readonly anymore
> 
> Signed-off-by: Xiao Guangrong 
> ---
>  hw/pc_sysfw.c |2 --
>  1 files changed, 0 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/pc_sysfw.c b/hw/pc_sysfw.c
> index b45f0ac..2d56fc7 100644
> --- a/hw/pc_sysfw.c
> +++ b/hw/pc_sysfw.c
> @@ -156,7 +156,6 @@ static void old_pc_system_rom_init(MemoryRegion 
> *rom_memory)
>  bios = g_malloc(sizeof(*bios));
>  memory_region_init_ram(bios, "pc.bios", bios_size);
>  vmstate_register_ram_global(bios);
> -memory_region_set_readonly(bios, true);
>  ret = rom_add_file_fixed(bios_name, (uint32_t)(-bios_size), -1);
>  if (ret != 0) {
>  bios_error:
> @@ -179,7 +178,6 @@ static void old_pc_system_rom_init(MemoryRegion 
> *rom_memory)
>  0x10 - isa_bios_size,
>  isa_bios,
>  1);
> -memory_region_set_readonly(isa_bios, true);
> 
>  /* map all the bios at the top of memory */
>  memory_region_add_subregion(rom_memory,
> 

This has two problems: We know it breaks at least Win 95 that overwrites
its F-segment during boot. And it applies changes to the shadowed area
(below 1 MB) also to the ROM area - I don't think that is the original
behaviour on real hardware.

What we need is paravirtual shadow write control for the ISA PC. It's on
my todo list, maybe I will be able to look into this during the next week.

BTW, your patch series should allow to drop the KVM special case from
pc_system_firmware_init. That version, btw, treats high and low BIOS
areas separately - but only reloads the upper area. Hmm...

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 11:58 +0200, Paolo Bonzini wrote:
> Il 25/10/2012 21:40, Benjamin Herrenschmidt ha scritto:
> >> > Probably you do need a variant of KVM_CREATE_IRQCHIP to create the
> >> > IOAPICs/source controllers (Paul's proposal at
> >> > http://permalink.gmane.org/gmane.comp.emulators.kvm.powerpc.devel/5674
> >> > for example), assign chip ids to them, set the number of input lines,
> >> > etc. but the configuration should work well with the existing ioctls,
> >> > with no limit on the number of sources. 
> > But what do you mean by "configuration" really ? I don't see anything in
> > common there.
> 
> Wiring which MSI-X interrupts go to which source controllers.  If you
> have one source controller per PCI bridge, you need to tell the kernel
> the mapping between MSI messages interrupts and PCI bridges, and update
> it whenever the MSI masking changes.

Not sure I get it. Are you talking in the context of PCI pass-through ?
Each PCI bridge on POWER has its own set of MSIs though for emulated
bridges it's a non-issue, it's all dealt with by qemu, so I'm not sure
what you mean here.

> The other problem is configuring the redirection table.  If you need >64
> sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.

Well, all of that is totally specific to the IO-APIC design &
limitations as far as I can tell. What is the "redirection table" ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Paolo Bonzini
Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
>> > Wiring which MSI-X interrupts go to which source controllers.  If you
>> > have one source controller per PCI bridge, you need to tell the kernel
>> > the mapping between MSI messages interrupts and PCI bridges, and update
>> > it whenever the MSI masking changes.
> Not sure I get it. Are you talking in the context of PCI pass-through ?

Not just that, it's also for emulated devices that do MSI-X (virtio-pci
does).

> > The other problem is configuring the redirection table.  If you need >64
> > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> Well, all of that is totally specific to the IO-APIC design &
> limitations as far as I can tell. What is the "redirection table" ?

The wiring between source and presentation controllers, roughly.  I
suppose that's what Paul referred to when he said there's 64 bits of
config info per source in the source controllers.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
> > Whether you want to do startup configuration and board wiring via
> > the same ioctl that handles runtime state save/load/migration is
> > a different question, of course.
> 
> QEMU's MSI-X routing is not x86-specific, so it should use the same
> KVM_SET_GSI_ROUTING ioctl that x86 uses. 

Well, that's the thing, I haven't managed to figure that out so far, it
looks very x86-specific to me. To begin with there's no such thing as a
"GSI" in our world.

Basically we have a global interrupt number space. Interrupt numbers are
24-bit long quantities. On real HW, some bits are called the "BUID" and
identify a given source controller and some bits are the interrupt
within that source controller but that's fairly flexible and generally
the OS doesn't care about it. The firmware sets up the mappings and
tells us the final numbers via the device-tree.

Under a hypervisor, it's totally virtualized already so we show a flat
24 bit number space to the guest.

MSIs don't work exactly like x86 either. On real HW, we have a different
MSI port per "partitionable endpoint" which are use purely for
validation of access permission. The message itself contain the
interrupt source number within the BUID of the bridge. A given bridge
today can contains up to 256 of these on a P7IOC chip but upcoming stuff
can have thousands. The final interrupt number seen by the OS is thus
just that MSI message in the bottom bits and the BUID in the top bits.

Here too, under a hypervisor, it's all virtualized so qemu just gives 24
bit numbers to the various emulated MSIs as part of the global interrupt
number space.

I'm not sure how any of that would need kernel communication. All we
need is to be able to associate a given global interrupt with an
eventfd.

I might just miss some subtleties here but so far I haven't been able to
figure out how to "shoehorn" our stuff in the very x86-centric existing
interfaces to the kernel APICs. In fact all that code is in a generic
location in kvm but is really x86/ia64 centric and the interfaces seem
to be as well.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Jan Kiszka
On 2012-10-26 12:44, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>>> Whether you want to do startup configuration and board wiring via
>>> the same ioctl that handles runtime state save/load/migration is
>>> a different question, of course.
>>
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses. 
> 
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.
> 
> Basically we have a global interrupt number space. Interrupt numbers are
> 24-bit long quantities. On real HW, some bits are called the "BUID" and
> identify a given source controller and some bits are the interrupt
> within that source controller but that's fairly flexible and generally
> the OS doesn't care about it. The firmware sets up the mappings and
> tells us the final numbers via the device-tree.
> 
> Under a hypervisor, it's totally virtualized already so we show a flat
> 24 bit number space to the guest.
> 
> MSIs don't work exactly like x86 either. On real HW, we have a different
> MSI port per "partitionable endpoint" which are use purely for
> validation of access permission. The message itself contain the
> interrupt source number within the BUID of the bridge. A given bridge
> today can contains up to 256 of these on a P7IOC chip but upcoming stuff
> can have thousands. The final interrupt number seen by the OS is thus
> just that MSI message in the bottom bits and the BUID in the top bits.
> 
> Here too, under a hypervisor, it's all virtualized so qemu just gives 24
> bit numbers to the various emulated MSIs as part of the global interrupt
> number space.
> 
> I'm not sure how any of that would need kernel communication. All we
> need is to be able to associate a given global interrupt with an
> eventfd.

And at latest there you will need the IRQ routing infrastructure of KVM.
It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
event at which input, e.g. a physical IRQ line at some IRQ controller or
a specific message at some MSI receiver. You shouldn't try to invent a
Power wheel here, rather tune the existing one to become more generic.
We could even try to get rid of that unfortunate GSI name (when leaving
aliases behind), though that is cosmetic.

> 
> I might just miss some subtleties here but so far I haven't been able to
> figure out how to "shoehorn" our stuff in the very x86-centric existing
> interfaces to the kernel APICs. In fact all that code is in a generic
> location in kvm but is really x86/ia64 centric and the interfaces seem
> to be as well.

That's not true in general, though you surely find a lot of traces and
still a few concrete x86 bits under virt/kvm.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 13:00 +0200, Jan Kiszka wrote:

> And at latest there you will need the IRQ routing infrastructure of KVM.
> It tells KVM which "virtual IRQ" (badly named "GSI") triggers which
> event at which input, e.g. a physical IRQ line at some IRQ controller or
> a specific message at some MSI receiver. You shouldn't try to invent a
> Power wheel here, rather tune the existing one to become more generic.
> We could even try to get rid of that unfortunate GSI name (when leaving
> aliases behind), though that is cosmetic.

Ok, there must be something wrong with me, I still don't understand what
you are talking about.

What "MSI receiver" ? What physical IRQ line are you talking about ? How
is the kernel involved ?

The only cases I can think of are the association between a virtual
interrupt (ie, an interrupt in the guest interrupt number space) and an
in-kernel source for that interrupt, ie, vhost and PCI pass-through
essentially.

Anything else is under qemu control. IE. MSIs or LSIs generated by
emulated devices are just normal interrupt that go through our ioctl to
trigger the in-kernel source with the same number.

I don't see any "routing" happening anywhere in that picture really. The
firmware calls done by the guest to change the target of interrupts
(which CPU/presentation controller to direct a given interrupt to) are
handled entirely in the kernel in platform specific code and update our
internal ICS state.

> > 
> > I might just miss some subtleties here but so far I haven't been able to
> > figure out how to "shoehorn" our stuff in the very x86-centric existing
> > interfaces to the kernel APICs. In fact all that code is in a generic
> > location in kvm but is really x86/ia64 centric and the interfaces seem
> > to be as well.
> 
> That's not true in general, though you surely find a lot of traces and
> still a few concrete x86 bits under virt/kvm.

Well, I haven't found anything in virt/kvm/irq_comm.c that was of any
use to us. Again, I might be sufferring from a major misunderstanding
here but as far as I can tell, the model is totally different. Besides,
that file has a hard coded list of what looks like completely x86
specific mappings between "GSI" and "interrupt numbers" (again I don't
understand completely the distinction and I don't think we have anything
like it on power).

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 12:40 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 12:37, Benjamin Herrenschmidt ha scritto:
> >> > Wiring which MSI-X interrupts go to which source controllers.  If you
> >> > have one source controller per PCI bridge, you need to tell the kernel
> >> > the mapping between MSI messages interrupts and PCI bridges, and update
> >> > it whenever the MSI masking changes.
> > Not sure I get it. Are you talking in the context of PCI pass-through ?
> 
> Not just that, it's also for emulated devices that do MSI-X (virtio-pci
> does).

Then I really don't get it.

> > > The other problem is configuring the redirection table.  If you need >64
> > > sources you need ioctls like KVM_GET/SET_IRQCHIP_ONE_REG.
> > Well, all of that is totally specific to the IO-APIC design &
> > limitations as far as I can tell. What is the "redirection table" ?
> 
> The wiring between source and presentation controllers, roughly.  I
> suppose that's what Paul referred to when he said there's 64 bits of
> config info per source in the source controllers.

But that's the point. We don't have such "wiring". The interrupt number
space is global. In HW it's via special messages in the fabric. The
firmware configures the various source controllers at boot time by
assigning them a BUID which basically comprises the top bits of the
interrupt number.

Or do you mean the routing configured by the user ? IE. Affinity ? If
yes, then that's indeed what the 64-bit per source is. Each interrupt
source has some state including the configured target presentation
controller (plus associated link info for distributed interrupts), a
priority setting, and some internal state bits that need to be preserved
in the case of migration.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Peter Maydell
On 26 October 2012 11:44, Benjamin Herrenschmidt
 wrote:
> On Fri, 2012-10-26 at 12:15 +0200, Paolo Bonzini wrote:
>> QEMU's MSI-X routing is not x86-specific, so it should use the same
>> KVM_SET_GSI_ROUTING ioctl that x86 uses.
>
> Well, that's the thing, I haven't managed to figure that out so far, it
> looks very x86-specific to me. To begin with there's no such thing as a
> "GSI" in our world.

This was roughly the feeling I had looking at these APIs. There
might be some underlying generic concept but there is a definite
tendency for the surface representation to use x86 specific
terminology to the extent that you can't tell whether an API
is x86 specific or merely apparently so...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
> > Well, that's the thing, I haven't managed to figure that out so far,
> it
> > looks very x86-specific to me. To begin with there's no such thing
> as a
> > "GSI" in our world.
> 
> This was roughly the feeling I had looking at these APIs. There
> might be some underlying generic concept but there is a definite
> tendency for the surface representation to use x86 specific
> terminology to the extent that you can't tell whether an API
> is x86 specific or merely apparently so...

Right. Which is why I'm sure I'm actually missing something there :-)
And I'm hoping Paolo and Jan will help shed some light.

It might help if somebody could explain a bit more what a GSI is in x86
land and how it relates to the various APICs, along with what exactly
they mean by "routing" , ie. what are the different elements that get
associated. Basically, if somebody could describe how the x86 stuff
works, that might help.

>From my view of things, we have various "sources" of interrupts. On my
list are emulated device LSIs, emulated device MSIs, both in qemu, then
vhost and finally pass-through, I suppose on some platforms IPIs come in
as well though. Those "sources" need, one way or another, to hit a
source controller which will then itself, in a platform specific way,
shoot the interrupt to a presentation controller.

The routing between source and presentation controllers is fairly
platform specific as far as I can tell even within a given CPU family.
Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
the way the XICS system does it on pseries, and is different from most
embedded stuff (which typically doesn't have that source/presentation
distinction but just cascaded dumber PICs). The amount of
configurability, the type of configuration information etc... that
governs such a layout is also very specific to the platform and the type
of interrupt controller system used on it.

Remains the "routing" between source of "events" and actual "inputs" to
a source controller.

This too doesn't seem totally obvious to generalize. For example an
embedded platform with a bunch of cascaded dumb interrupt controllers
doesn't have a concept of a flat number space in HW, an interrupt
"input" to be identified properly, needs to identify the controller and
the interrupt within that controller. However, within KVM/qemu, it's
pretty easy to assign to each controller a number and by collating the
two, get some kind of flat space, though it's not arbitrary and the
routing is thus fairly constrained if not totally fixed.

In the pseries case, the global number is split in two bit fields, the
BUID identifying the specific source controller and the source within
that controller. Here too it's fairly fixed though. So the ioctl we use
to create a source controller in the kernel takes the BUID as an
argument, and from there the kernel will "find" the right source
controller based solely on the interrupt number.

So basically on one side we have a global interrupt number that
identifies an "input", I assume that's what x86 calls a GSI ?

Remains how to associate the various sources of interrupts to that
'global number'... and that is fairly specific to each source type isn't
it ?

In our current powerpc code, the emulated devices toggle the qirq which
ends up shooting an ioctl to set/reset or "message" (for MSIs) the
corresponding global interrupt. The mapping is established entirely
within qemu, we just tell the kernel to trigger a given interrupt.

We haven't really sorted vhost out yet so I'm not sure how that will
work out but the idea would be to have an ioctl to associate an eventfd
or whatever vhost uses as interrupt "outputs" with a global interrupt
number.

For pass-through, currently our VFIO is dumb, interrupts get to qemu
which then shoots them back to the kernel using the standard qirq stuff
used by emulated devices. Here I suppose we would want something similar
to vhost to associate the VFIO irq fd with a "global number".

Is that what the existing ioctl's provide ? Their semantics aren't
totally obvious to me.

Note that for pass-through at least, and possibly for vhost, we'd like
to actually totally bypass the irqfd & eventfd stuff for performance
reasons. At least for VFIO, if we are going to get the max performance
out of it, we need to take all generic code out of the picture. IE. If
the interrupts are routed to the physical CPU where the guest is
running, we want to be able to catch and distribute the interrupts to
the guest entirely within guest context, ie, with KVM arch specific low
level code that runs in "real mode" (ie MMU off) without context
switching the MMU back to the host, which on POWER is fairly costly.

That means that at least the association between a guest global
interrupt number and a host global interrupt number for pass-through
will be something that goes entirely through arch specific code path. We
might still be able to use gene

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Paolo Bonzini
[snipping some parts that Jan answered about already]

Il 26/10/2012 12:47, Benjamin Herrenschmidt ha scritto:
> Or do you mean the routing configured by the user ? IE. Affinity ? If
> yes, then that's indeed what the 64-bit per source is. Each interrupt
> source has some state including the configured target presentation
> controller (plus associated link info for distributed interrupts), a
> priority setting, and some internal state bits that need to be preserved
> in the case of migration.

Yes, that's pretty much the contents of the IOAPIC redirection table.
x86 has more stuff such as the polarity (low/high), masking, triggering
mode (edge/level), etc., but the main thing is the destination and vector.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Paolo Bonzini
Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> The only cases I can think of are the association between a virtual
> interrupt (ie, an interrupt in the guest interrupt number space) and an
> in-kernel source for that interrupt, ie, vhost and PCI pass-through
> essentially.

If you exclude old-style PCI pass-through and limit yourself to vhost
and VFIO, you can treat irqfd as "the" in-kernel source of the
interrupt.  Then you need a mapping between MSIs and numbers used in
KVM_IRQFD ("GSIs").

This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
modified every time a vector is masked/unmasked in the MSI-X table.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Peter Maydell
On 26 October 2012 12:57, Paolo Bonzini  wrote:
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").
>
> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Jan Kiszka
On 2012-10-26 13:39, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 12:17 +0100, Peter Maydell wrote:
>>> Well, that's the thing, I haven't managed to figure that out so far,
>> it
>>> looks very x86-specific to me. To begin with there's no such thing
>> as a
>>> "GSI" in our world.
>>
>> This was roughly the feeling I had looking at these APIs. There
>> might be some underlying generic concept but there is a definite
>> tendency for the surface representation to use x86 specific
>> terminology to the extent that you can't tell whether an API
>> is x86 specific or merely apparently so...
> 
> Right. Which is why I'm sure I'm actually missing something there :-)
> And I'm hoping Paolo and Jan will help shed some light.
> 
> It might help if somebody could explain a bit more what a GSI is in x86
> land and how it relates to the various APICs, along with what exactly
> they mean by "routing" , ie. what are the different elements that get
> associated. Basically, if somebody could describe how the x86 stuff
> works, that might help.
> 
> From my view of things, we have various "sources" of interrupts. On my
> list are emulated device LSIs, emulated device MSIs, both in qemu, then
> vhost and finally pass-through, I suppose on some platforms IPIs come in
> as well though. Those "sources" need, one way or another, to hit a
> source controller which will then itself, in a platform specific way,
> shoot the interrupt to a presentation controller.
> 
> The routing between source and presentation controllers is fairly
> platform specific as far as I can tell even within a given CPU family.
> Ie the way an OpenPIC (aka MPIC, used on macs) does it is different than
> the way the XICS system does it on pseries, and is different from most
> embedded stuff (which typically doesn't have that source/presentation
> distinction but just cascaded dumber PICs). The amount of
> configurability, the type of configuration information etc... that
> governs such a layout is also very specific to the platform and the type
> of interrupt controller system used on it.

But we are just talking about sending messages from A to B or soldering
an input to an output pin. That's pretty generic. Give each output event
a virtual IRQ number and define where its output "line" should be linked
to (input pin of target controller). All what will be specific are the
IDs of those controllers.

Of course, all that provided you do their emulation in kernel space. For
x86, that even makes sense when the IRQ sources are in user space as the
guest may still have to interact during IRQ delivery with IOAPIC, thus
we save some costly heavy-weight exits when putting it in the kernel.

> 
> Remains the "routing" between source of "events" and actual "inputs" to
> a source controller.
> 
> This too doesn't seem totally obvious to generalize. For example an
> embedded platform with a bunch of cascaded dumb interrupt controllers
> doesn't have a concept of a flat number space in HW, an interrupt
> "input" to be identified properly, needs to identify the controller and
> the interrupt within that controller. However, within KVM/qemu, it's
> pretty easy to assign to each controller a number and by collating the
> two, get some kind of flat space, though it's not arbitrary and the
> routing is thus fairly constrained if not totally fixed.

IRQ routing entry:
 - virq number ("gsi")
 - type (controller ID, MSI, whatever you like)
 - some flags (to extend it)
 - type-specific data (MSI message, controller input pin, etc.)

And there can be multiple entries with the same virq, thus you can
deliver to multiple targets. I bet you can model quite a lot of your
platform specific routing this way. I'm not saying our generic code will
work out of the box, but at least the interfaces and concepts are there.

> 
> In the pseries case, the global number is split in two bit fields, the
> BUID identifying the specific source controller and the source within
> that controller. Here too it's fairly fixed though. So the ioctl we use
> to create a source controller in the kernel takes the BUID as an
> argument, and from there the kernel will "find" the right source
> controller based solely on the interrupt number.
> 
> So basically on one side we have a global interrupt number that
> identifies an "input", I assume that's what x86 calls a GSI ?

Right. The virtual IRQ numbers we call "GSI" is partially occupied by
the actual x86-GSIs (0..n, with n=23 so far), directed to the IOAPIC and
PIC there, and then followed by IRQs that are mapped on MSI messages.
But that's just how we _use_ it on x86, not how it has to work for other
archs.

> 
> Remains how to associate the various sources of interrupts to that
> 'global number'... and that is fairly specific to each source type isn't
> it ?
> 
> In our current powerpc code, the emulated devices toggle the qirq which
> ends up shooting an ioctl to set/reset or "message" (for MSIs) the
> corresponding global interrupt. The mapping is

Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Jan Kiszka
On 2012-10-26 14:08, Peter Maydell wrote:
> On 26 October 2012 12:57, Paolo Bonzini  wrote:
>> If you exclude old-style PCI pass-through and limit yourself to vhost
>> and VFIO, you can treat irqfd as "the" in-kernel source of the
>> interrupt.  Then you need a mapping between MSIs and numbers used in
>> KVM_IRQFD ("GSIs").
>>
>> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
>> modified every time a vector is masked/unmasked in the MSI-X table.
> 
> So SET_GSI_ROUTING sets the routing for MSIs? Very logical...

See my reply to Ben: It is used for MSIs as well, but not only. The
concept is absolutely generic, you just need to define specific target
types and provide ways to associate specific sources with a virtual IRQ
number ("GSI").

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86, add hypervisor name to dump_stack() [v2]

2012-10-26 Thread Prarit Bhargava
Debugging crash, panics, stack trace WARN_ONs, etc., from both virtual and
bare-metal boots can get difficult very quickly.  While there are ways to
decipher the output and determine if the output is from a virtual guest,
the in-kernel hypervisors now have a single registration point
and set x86_hyper.  We can use this to output additional debug
information during a panic/oops/stack trace.

Signed-off-by: Prarit Bhargava 
Cc: Avi Kivity 
Cc: Gleb Natapov 
Cc: Alex Williamson 
Cc: Marcelo Tostatti 
Cc: Ingo Molnar 
Cc: kvm@vger.kernel.org
Cc: x...@kernel.org

[v2]: Modifications suggested by Ingo and added changes for similar output
  from process.c
---
 arch/x86/kernel/dumpstack.c |   11 ++-
 arch/x86/kernel/process.c   |   12 +++-
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index ae42418b..5dd680f 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 
+#include 
 #include 
 
 
@@ -186,9 +187,17 @@ void dump_stack(void)
 {
unsigned long bp;
unsigned long stack;
+   const char *machine_name = "x86";
+   const char *kernel_type = "native";
+
+   if (x86_hyper) {
+   machine_name = x86_hyper->name;
+   kernel_type = "guest";
+   }
 
bp = stack_frame(current, NULL);
-   printk("Pid: %d, comm: %.20s %s %s %.*s\n",
+   printk("[%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s\n",
+   machine_name, kernel_type,
current->pid, current->comm, print_tainted(),
init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b644e1c..14bd064 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
@@ -124,6 +125,13 @@ void exit_thread(void)
 void show_regs_common(void)
 {
const char *vendor, *product, *board;
+   const char *machine_name = "x86";
+   const char *kernel_type = "native";
+
+   if (x86_hyper) {
+   machine_name = x86_hyper->name;
+   kernel_type = "guest";
+   }
 
vendor = dmi_get_system_info(DMI_SYS_VENDOR);
if (!vendor)
@@ -135,7 +143,9 @@ void show_regs_common(void)
/* Board Name is optional */
board = dmi_get_system_info(DMI_BOARD_NAME);
 
-   printk(KERN_DEFAULT "Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n",
+   printk(KERN_DEFAULT
+  "[%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n",
+  machine_name, kernel_type,
   current->pid, current->comm, print_tainted(),
   init_utsname()->release,
   (int)strcspn(init_utsname()->version, " "),
-- 
1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, add hypervisor name to dump_stack() [v2]

2012-10-26 Thread Ingo Molnar

* Prarit Bhargava  wrote:

> Debugging crash, panics, stack trace WARN_ONs, etc., from both virtual and
> bare-metal boots can get difficult very quickly.  While there are ways to
> decipher the output and determine if the output is from a virtual guest,
> the in-kernel hypervisors now have a single registration point
> and set x86_hyper.  We can use this to output additional debug
> information during a panic/oops/stack trace.
> 
> Signed-off-by: Prarit Bhargava 
> Cc: Avi Kivity 
> Cc: Gleb Natapov 
> Cc: Alex Williamson 
> Cc: Marcelo Tostatti 
> Cc: Ingo Molnar 
> Cc: kvm@vger.kernel.org
> Cc: x...@kernel.org
> 
> [v2]: Modifications suggested by Ingo and added changes for similar output
>   from process.c
> ---
>  arch/x86/kernel/dumpstack.c |   11 ++-
>  arch/x86/kernel/process.c   |   12 +++-
>  2 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index ae42418b..5dd680f 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  
>  
> @@ -186,9 +187,17 @@ void dump_stack(void)
>  {
>   unsigned long bp;
>   unsigned long stack;
> + const char *machine_name = "x86";
> + const char *kernel_type = "native";
> +
> + if (x86_hyper) {
> + machine_name = x86_hyper->name;
> + kernel_type = "guest";
> + }
>  
>   bp = stack_frame(current, NULL);
> - printk("Pid: %d, comm: %.20s %s %s %.*s\n",
> + printk("[%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s\n",
> + machine_name, kernel_type,

I'd put the kernel info at the end of the line.

It's all very exciting I know, because we are working on this 
printout right now and all that - but to users and developers 
the PID/comm output plus the backtrace is far more important.

>   current->pid, current->comm, print_tainted(),
>   init_utsname()->release,
>   (int)strcspn(init_utsname()->version, " "),
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index b644e1c..14bd064 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * per-CPU TSS segments. Threads are completely 'soft' on Linux,
> @@ -124,6 +125,13 @@ void exit_thread(void)
>  void show_regs_common(void)
>  {
>   const char *vendor, *product, *board;
> + const char *machine_name = "x86";
> + const char *kernel_type = "native";
> +
> + if (x86_hyper) {
> + machine_name = x86_hyper->name;
> + kernel_type = "guest";
> + }
>  
>   vendor = dmi_get_system_info(DMI_SYS_VENDOR);
>   if (!vendor)
> @@ -135,7 +143,9 @@ void show_regs_common(void)
>   /* Board Name is optional */
>   board = dmi_get_system_info(DMI_BOARD_NAME);
>  
> - printk(KERN_DEFAULT "Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n",
> + printk(KERN_DEFAULT
> +"[%s %s kernel] Pid: %d, comm: %.20s %s %s %.*s %s %s%s%s\n",
> +machine_name, kernel_type,
>  current->pid, current->comm, print_tainted(),
>  init_utsname()->release,
>  (int)strcspn(init_utsname()->version, " "),

Ha, duplicate code doing almost the same thing!

I suspect you know what my next suggestion would be? :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] kvm tools: remove duplicated include from builtin-setup.c

2012-10-26 Thread Wei Yongjun
From: Wei Yongjun 

Remove duplicated include.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun 
---
 tools/kvm/builtin-setup.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
index 1b865b7..c5b0566 100644
--- a/tools/kvm/builtin-setup.c
+++ b/tools/kvm/builtin-setup.c
@@ -13,11 +13,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
 #include 
 
 extern char _binary_guest_init_start;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] kvm tools: remove duplicated include from builtin-setup.c

2012-10-26 Thread Pekka Enberg
On Fri, 26 Oct 2012, Wei Yongjun wrote:

> From: Wei Yongjun 
> 
> Remove duplicated include.
> 
> dpatch engine is used to auto generate this patch.
> (https://github.com/weiyj/dpatch)
> 
> Signed-off-by: Wei Yongjun 
> ---
>  tools/kvm/builtin-setup.c | 4 
>  1 file changed, 4 deletions(-)
> 
> diff --git a/tools/kvm/builtin-setup.c b/tools/kvm/builtin-setup.c
> index 1b865b7..c5b0566 100644
> --- a/tools/kvm/builtin-setup.c
> +++ b/tools/kvm/builtin-setup.c
> @@ -13,11 +13,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
> -#include 
> -#include 
> -#include 
>  #include 
>  
>  extern char _binary_guest_init_start;

Applied, thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] kvm tools: use the correct config vector interrupt

2012-10-26 Thread Pekka Enberg
On Thu, 25 Oct 2012, Sasha Levin wrote:
> On Thu, Oct 25, 2012 at 3:03 AM, Pekka Enberg  wrote:
> > On Wed, 24 Oct 2012, William Dauchy wrote:
> >> when registering the config interrupt, the later is registered in
> >> vcpi->config_vector and not in vpci->vq_vector
> >>
> >> introduced in:
> >> a841f15 kvm tools: Use the new KVM_SIGNAL_MSI ioctl to inject
> >> interrupts directly.
> >>
> >> Signed-off-by: William Dauchy 
> >> ---
> >>  tools/kvm/virtio/pci.c |2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/tools/kvm/virtio/pci.c b/tools/kvm/virtio/pci.c
> >> index ab1119a..f4ea3c9 100644
> >> --- a/tools/kvm/virtio/pci.c
> >> +++ b/tools/kvm/virtio/pci.c
> >> @@ -288,7 +288,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
> >> virtio_device *vdev)
> >>   }
> >>
> >>   if (vpci->features & VIRTIO_PCI_F_SIGNAL_MSI)
> >> - virtio_pci__signal_msi(kvm, vpci, 
> >> vpci->vq_vector[vpci->config_vector]);
> >> + virtio_pci__signal_msi(kvm, vpci, 
> >> vpci->config_vector);
> >>   else
> >>   kvm__irq_trigger(kvm, vpci->config_gsi);
> >>   } else {
> >
> > Sasha?
> 
> Indeed, we tried signaling the config vector by signaling vq0, woops.
> 
> Acked-by: Sasha Levin 

Applied, thanks guys!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] lkvm crash on crashkernel boot

2012-10-26 Thread Pekka Enberg
On Thu, 25 Oct 2012, Sasha Levin wrote:
> I think we're seeing that because we don't handle VIRTIO_MSI_NO_VECTOR 
> properly.
> 
> We need to deal with the ability to remove GSI & friends as well. I've
> added it to my workqueue (unless someone deals with it first).

Any reason I shouldn't apply Kirill's patch before someone find the time 
to do that?

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] lkvm crash on crashkernel boot

2012-10-26 Thread Cyrill Gorcunov
On Fri, Oct 26, 2012 at 06:31:00PM +0300, Pekka Enberg wrote:
> On Thu, 25 Oct 2012, Sasha Levin wrote:
> > I think we're seeing that because we don't handle VIRTIO_MSI_NO_VECTOR 
> > properly.
> > 
> > We need to deal with the ability to remove GSI & friends as well. I've
> > added it to my workqueue (unless someone deals with it first).
> 
> Any reason I shouldn't apply Kirill's patch before someone find the time 
> to do that?

I think it's worth to apply until proper fix appear.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PULL stable-0.15] Stable-0.15 queue for qemu-kvm

2012-10-26 Thread Marcelo Tosatti
On Tue, Oct 09, 2012 at 08:08:47PM +0200, Andreas Färber wrote:
> Hello Marcelo,
> 
> Here's a couple of backports for your stable-0.15 branch.
> Except for one (marked as "backported") these were all clean cherry-picks.
> 
> My proposal is to merge these KVM-only patches before qemu-stable-0.15.git,
> where I will be tagging v0.15.2 shortly.
> 
> Cc: Marcelo Tosatti 
> Cc: Avi Kivity 
> 
> Cc: Anthony Liguori 
> Cc: Bruce Rogers 
> 
> The following changes since commit 725ba81ec6812d7eeb69be92639eb81151b5306e:
> 
>   Merge remote branch 'upstream/stable-0.15' into stable-0.15 (2011-10-19 
> 11:54:48 -0200)
> 
> are available in the git repository at:
> 
> 
>   git://repo.or.cz/qemu/afaerber.git qemu-kvm-stable-0.15
> 
> for you to fetch changes up to b2be0429795b18c018610d48142e797cbc31be0d:
> 
>   pci-assign: Remove bogus PCIe lnkcap wmask setting (2012-10-09 19:03:59 
> +0200)
> 
> 
> Alex Williamson (4):
>   pci-assign: Fix PCI_EXP_FLAGS_TYPE shift
>   pci-assign: Fix PCIe lnkcap
>   pci-assign: Harden I/O port test
>   pci-assign: Remove bogus PCIe lnkcap wmask setting
> 
> Jan Kiszka (1):
>   pci-assign: Update legacy interrupts only if used
> 
> Lai Jiangshan (1):
>   qemu-kvm: fix improper nmi emulation
> 
>  hw/apic.c  |   33 +
>  hw/apic.h  |1 +
>  hw/device-assignment.c |   30 +++---
>  monitor.c  |6 +-
>  4 Dateien geändert, 54 Zeilen hinzugefügt(+), 16 Zeilen entfernt(-)

Pulled, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/15] QEMU KVM_GET_SUPPORTED_CPUID cleanups and fixes

2012-10-26 Thread Marcelo Tosatti
On Thu, Oct 04, 2012 at 05:48:52PM -0300, Eduardo Habkost wrote:
> Most of this series are just cleanups that will help when making -cpu
> check/enforce work properly, with some fixes.
> 
> In addition to code movements, the main changes are:
>  - x2apic won't be enabled if in-kernel irqchip is disabled
>(patch 10)
>  - CPUID feature bit filtering is done much earlier, and inside 
> target-i386/cpu.c
>(patch 13)
>  - CPUID leaf 7 feature bits are now filterd based on GET_SUPPORTED_CPUID too
>(patch 15)
> 
> Eduardo Habkost (15):
>   i386: kvm: kvm_arch_get_supported_cpuid: move R_EDX hack outside of
> for loop
>   i386: kvm: kvm_arch_get_supported_cpuid: clean up has_kvm_features
> check
>   i386: kvm: kvm_arch_get_supported_cpuid: use 'entry' variable
>   i386: kvm: extract register switch to cpuid_entry_get_reg() function
>   i386: kvm: extract CPUID entry lookup to cpuid_find_entry() function
>   i386: kvm: extract try_get_cpuid() loop to get_supported_cpuid()
> function
>   i386: kvm: kvm_arch_get_supported_cpuid: replace if+switch with
> single 'if'
>   i386: kvm: set CPUID_EXT_HYPERVISOR on kvm_arch_get_supported_cpuid()
>   i386: kvm: set CPUID_EXT_TSC_DEADLINE_TIMER on
> kvm_arch_get_supported_cpuid()
>   i386: kvm: x2apic is not supported without in-kernel irqchip
>   i386: kvm: mask cpuid_kvm_features earlier
>   i386: kvm: mask cpuid_ext4_features bits earlier
>   i386: kvm: filter CPUID feature words earlier, on cpu.c
>   i386: kvm: reformat filter_features_for_kvm() code
>   i386: kvm: filter CPUID leaf 7 based on GET_SUPPORTED_CPUID, too
> 
>  kvm.h |   1 +
>  target-i386/cpu.c |  30 +++
>  target-i386/kvm.c | 153 
> --
>  3 files changed, 122 insertions(+), 62 deletions(-)

Applied to uq/master, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [QEMU PATCH] i386: cpu: add missing CPUID[EAX=7, ECX=0] flag names

2012-10-26 Thread Marcelo Tosatti
On Tue, Oct 09, 2012 at 12:43:52PM -0400, Don Slutz wrote:
> On 10/09/12 10:03, Eduardo Habkost wrote:
> >This makes QEMU recognize the following CPU flag names:
> >
> >  Flags| Corresponding KVM kernel commit
> >  -+
> >  FSGSBASE | 176f61da82435eae09cc96f70b530d1ba0746b8b
> >  AVX2, BMI1, BMI2 | fb215366b3c7320ac25dca766a0152df16534932
> >  HLE, RTM | 83c529151ab0d4a813e3f6a3e293fff75d468519
> >  INVPCID  | ad756a1603c5fac207758faaac7f01c34c9d0b7b
> >  ERMS | a01c8f9b4e266df1d7166d23216f2060648f862d
> >
> >Signed-off-by: Eduardo Habkost 
> >---
> >  target-i386/cpu.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> >diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> >index f3708e6..b012372 100644
> >--- a/target-i386/cpu.c
> >+++ b/target-i386/cpu.c
> >@@ -105,8 +105,8 @@ static const char *svm_feature_name[] = {
> >  };
> >  static const char *cpuid_7_0_ebx_feature_name[] = {
> >-NULL, NULL, NULL, NULL, NULL, NULL, NULL, "smep",
> >-NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
> >+"fsgsbase", NULL, NULL, "bmi1", "hle", "avx2", NULL, "smep",
> >+"bmi2", "erms", "invpcid", "rtm", NULL, NULL, NULL, NULL,
> >  NULL, NULL, NULL, NULL, "smap", NULL, NULL, NULL,
> >  NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
> >  };
> Reviewed-by: Don Slutz 

Applied to uq/master, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm-clock clocksource efficiency versus tsc...

2012-10-26 Thread Marcelo Tosatti
On Tue, Oct 16, 2012 at 10:54:59AM +0200, Erik Brakkee wrote:
> OS: Centos 6.2
> KVM version: qemu-kvm-tools-0.12.1.2-2.209.el6_2.4.x86_64
>  qemu-kvm-0.12.1.2-2.209.el6_2.4.x86_64
> uname -a:
> Linux myhost 2.6.32-220.7.1.el6.x86_64 #1 SMP Wed Mar 7 00:52:02 GMT 2012
> x86_64 x86_64 x86_64 GNU/Linux
> 
> Hi,
> 
> 
> I have been performance testing a time tracing utiliity for a Java
> enterprise application at work. The idea is that we measure time for
> different parts of our application and build time trees for this
> information. The time tree can be viewed and analysed in case of problems.
> It is also automatically output in case a certain operation is taking
> longer than a threshold.
> 
> As part of these tests we found that it matters a lot what type of
> clocksource is used. In particular, with hpet and acpi_pm the execution is
> very slow (700ns per call (similar results using clock_gettime() in a C
> program). In addition, hpet and acpi_pm sycnrhonize the application. This
> is of course a disaster for server applications that tend to query the
> current time quite a lot.
> 
> What works quite well is the tsc clocksource. In that case, the time drops
> to about 38ns on one of our systems and we can prove that there is no
> synchronization anymore which is good.
> 
> When running inside a KVM VM the default clock source is kvm-clock. This
> clock takes about 160ns per call and also does not synchronise the
> application. However, using the tsc clock source delivers similar
> performance on the virtual machine as on the host.
> 
> I now have the following questions:
> * can the performance of kvm-clock be optimized further to be (almost)
> identical to that of the host's clock source?

There is a patchset on the list "pvclock vsyscall support + KVM hypervisor
support (v2)", using percentages of my testbox it would reduce your case
from 160ns to 64ns. Testers welcome.

> * what are the consequences of using the tsc clock in combination with
> NTPD? Will this result in system instability or larger than usual clock
> skew?

- Can't migrate guest to host with different TSC frequency.
- No stability of system clock with guest vmsave/vmrestore, suspend/resume 
of the host.
- Requires host with stable TSC. 

Excluding the cases above, as long as NTP manages to synchronize the
system clock you're fine (ntpq -c rv "frequency" field below 100).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 13:57 +0200, Paolo Bonzini wrote:
> Il 26/10/2012 13:09, Benjamin Herrenschmidt ha scritto:
> > The only cases I can think of are the association between a virtual
> > interrupt (ie, an interrupt in the guest interrupt number space) and an
> > in-kernel source for that interrupt, ie, vhost and PCI pass-through
> > essentially.
> 
> If you exclude old-style PCI pass-through and limit yourself to vhost
> and VFIO, you can treat irqfd as "the" in-kernel source of the
> interrupt.  Then you need a mapping between MSIs and numbers used in
> KVM_IRQFD ("GSIs").

Argh. Ok, I get that we need a mapping between an irqfd and a global
number, I don't see where MSIs come into the picture at all here. At
least for us they don't.

> This is what KVM_SET_GSI_ROUTING modifies, and basically the mapping is
> modified every time a vector is masked/unmasked in the MSI-X table.

Right and that doesn't make sense for us. In fact I don't understand how
it makes sense for x86 either, but that's because I don't understand how
the APIC works I suppose.

Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:

> But we are just talking about sending messages from A to B or soldering
> an input to an output pin. That's pretty generic. Give each output event
> a virtual IRQ number and define where its output "line" should be linked
> to (input pin of target controller). All what will be specific are the
> IDs of those controllers.

Hrm you seem to be saying something very different from Paolo here.
Unless it's just a very very confused terminology.

So let's see the powerpc "pseries" case. Things like embedded etc...
might be quite different.

We have essentially two "outputs" here. One is qemu itself shooting
interrupts (emulated devices, virtio, etc...). This is an ioctl and that
gives you a global interrupt number. So this goes directly to the source
controller which then uses it's internal logic to send that to the
presentation controller in ways that are entirely implementation
specific.

The specific source controller is located using the top bits of the
global interrupt number (the BUID). When we create source controllers,
we pass as argument to the ioctl the BUID for that source controller and
the number of interrupts it handles.

The other "output" is irqfd for kernel originated events. Here I assume
there's an in-kernel way to directly call a function rather than queue
something for qemu to consume later, anything else would be horribly
wasteful. Here too, what we need here is a global interrupt number, so
we can find the source controller by BUID and shoot it the interrupt.

So that's the only case I see where we need an association of some kind,
which is irqfs -> global number. I don't see where the "MSIs" that Paolo
keep talking about come into play. User space (emulated) MSIs are dealt
within qemu entirely and MSIs from VFIO end up as irqfd.

Finally there is the "routing" between a given interrupt source (an
entry in the source controller state table) and the target processor
(the corresponding presentation controller).

That routing is purely a field in the source controller field, which is
there along with the interrupt priority and a few state bits. (We don't
need to deal with level/edge because of the way the ICS work, we just
say at the time of the triggering of an interrupt whether it's a level
set, level reset, or message, and it will do the right thing).

This field is accessed (programmed) by the guest using a firmware
interface that is implemented in the kernel part of KVM. It's a platform
specific API and it accesses the source controller (it's implemented
three really). I don't see where any generic API here would make sense
other than maybe adding useless bloat.

The only place where qemu might "see" that stuff is for migration where
it needs to save all the state of all the sources and restore it on the
target.

The actual communication between source controllers and presentation
controllers is also entirely platform specific. It follows a somewhat
specified protocol (we mimmic what the HW actually does) and here too, I
see no room for anything generic. 

> Of course, all that provided you do their emulation in kernel space. For
> x86, that even makes sense when the IRQ sources are in user space as the
> guest may still have to interact during IRQ delivery with IOAPIC, thus
> we save some costly heavy-weight exits when putting it in the kernel.

We have a way to lower that cost. Since the interaction with the
presentation controller is done by hypervisor calls, we handle them
directly in real mode within the guest MMU context unless some
exceptional condition is hit (such as the need to trigger a resend from
one of the source controllers or an interrupt rejection).

> > 
> > Remains the "routing" between source of "events" and actual "inputs" to
> > a source controller.
> > 
> > This too doesn't seem totally obvious to generalize. For example an
> > embedded platform with a bunch of cascaded dumb interrupt controllers
> > doesn't have a concept of a flat number space in HW, an interrupt
> > "input" to be identified properly, needs to identify the controller and
> > the interrupt within that controller. However, within KVM/qemu, it's
> > pretty easy to assign to each controller a number and by collating the
> > two, get some kind of flat space, though it's not arbitrary and the
> > routing is thus fairly constrained if not totally fixed.
> 
> IRQ routing entry:
>  - virq number ("gsi")
>  - type (controller ID, MSI, whatever you like)

What is "controller ID" ? That doesn't mean anything to me. In our case,
the specific source controller is known from the virq number (the top
bits of it basically).

>  - some flags (to extend it)
>  - type-specific data (MSI message, controller input pin, etc.)

I don't understand that business about MSIs really. I suppose it has to
do with the way you do old-style device assignment ? Either MSIs come
from virtual/emulated devices in which case they are a qemu fiction and
qemu just sends us an ioctl 

Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-26 Thread Marcelo Tosatti
On Tue, Oct 23, 2012 at 07:56:54PM +, Auld, Will wrote:
> Having looked closer at the tacked of changing out the index and data fields 
> in some
> function calls for a struct parameter with these and a originator field (host 
> or guest)
> it is less attractive than I thought it would be. The only place where we 
> need to know the initiator is in kvm_write_tsc() which has an implicit index.

At the moment yes, but it might have other uses in the future.

> I have been trying to determine whether there is a possibility for taking a 
> context switch while a guest initiated set_msr() is in progress whereby the 
> new thread might invoke the set_msr()/kvm_write_tsc() routines. It looks to 
> me like this is not possible but I can't be sure. 

It is not possible.

> If it is not possible we can set a variable for the vcpu when a guest call is 
> in progress and this would be sufficient. 
>
> What do you think?
> Thanks,

The struct parameter seems the preferred choice as there might be other
uses to this information in the future.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ia64: remove unused variable in kvm_release_vm_pages()

2012-10-26 Thread Marcelo Tosatti
On Wed, Oct 17, 2012 at 11:03:42PM +0800, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> The variable base_gfn is initialized but never used
> otherwise, so remove the unused variable.
> 
> dpatch engine is used to auto generate this patch.
> (https://github.com/weiyj/dpatch)
> 
> Signed-off-by: Wei Yongjun 
> ---
>  arch/ia64/kvm/kvm-ia64.c | 2 --
>  1 file changed, 2 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 1/1] target-i386: Add missing kvm bits.

2012-10-26 Thread Marcelo Tosatti
On Fri, Oct 12, 2012 at 03:43:23PM -0400, Don Slutz wrote:
> Currently "-cpu host,-kvmclock,-kvm_nopiodelay,-kvm_mmu" does not
> turn off all bits in CPUID 0x4001 EAX.
> 
> The missing ones are KVM_FEATURE_STEAL_TIME and
> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT.
> 
> This adds the names kvm_steal_time and kvm_clock_stable for these
> bits.
> 
> Signed-off-by: Don Slutz 

It does not make sense to expose KVM_FEATURE_STEAL_TIME 
to the user (as its going to be controlled by the kernel), applied
with kvm_steal_time addition only. Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-26 Thread Benjamin Herrenschmidt
On Sat, 2012-10-27 at 07:45 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2012-10-26 at 14:39 +0200, Jan Kiszka wrote:
> 
> > But we are just talking about sending messages from A to B or soldering
> > an input to an output pin. That's pretty generic. Give each output event
> > a virtual IRQ number and define where its output "line" should be linked
> > to (input pin of target controller). All what will be specific are the
> > IDs of those controllers.
> 
> Hrm you seem to be saying something very different from Paolo here.
> Unless it's just a very very confused terminology.
> 
> So let's see the powerpc "pseries" case. Things like embedded etc...
> might be quite different.

So I had a chat with Anthony who explained to me a bit more about what
the x86 stuff is about. It's pretty horrible I must say :-)

So correct me if I'm wrong but you essentially have to differentiate
between MSI "outputs" and other (GSI) "outputs" due to the fact that
MSIs in x86 land don't act as normal interrupts going through a source
controller but instead get shot directly to the target CPU.

Then you have to establish some kind of "routing" from those GSIs to
some IO/APIC, and from MSIs to local APICs.

That's where I think there is a fairly fundamental difference with us.

So let's cut that problem in two. The GSI bit and the MSI bit. The
reason is that the way x86 does MSIs seems to be fairly x86 specific, I
wouldn't be surprised if everybody else did MSIs like we do them, that
is turn them into normal interrupts (ie, GSIs). But let's discuss that
below.

So the GSI bit. We can assume that GSIs in that context are basically
our "global interrupt number". This would apply to pretty much every
platform indeed.

The routing here, if I understand things correctly, consists of
associating such a global interrupt number with a specific input pin (or
virtual pin) of a specific source controller (ie, IO APIC).

This would generally make sense in embedded space as well I suppose,
where you can have multiple or even cascaded interrupt controllers of
different breeds etc...

However, in the pseries system, this routing is essentially encoded in
the interrupt number itself. As I think I explained earlier, the number
is arbitrarily split in two parts, the top bits indicating the source
controller and the bottom bits the source within that controller. In
qemu/kvm we have made an arbitrary split (whose size I don't remember
precisely) and we currently create only one fairly big source controller
but we might change that in the future.

This there is no such thing as needing to "associate" or create routing
entries here. qemu will directly shoot "GSIs" using an ioctl and our
code can directly map that to a source controller without any routing
table of any sort. In fact, adding one would complicate things since
we'd have a requirement that it's populated 1:1 or thing would get very
confused indeed so overall, there's no point for us to implement or use
that API or the "generic" code behind it, it would be pure bloat,
complication and problems.

However, making  that code more generic might make sense for other
platforms (including other powerpc platforms such as embedded) where
multiple interrupt controllers may exist though here too, it's probably
going to be fairly common that the GSI numbers are essentially be a bit
field split with entire ranges assigned to a given PIC. We don't have to
emulate x86+ACPI ability to individually remap interrupts.

The case if MSIs now. My understanding from what Anthony says is that
your MSIs essentially bypass the IO APIC and route directly to the local
APIC, which is equivalent to our presentation controller. You thus need
specific APIs to associate an MSI (which isn't a GSI) to as specific
local APIC.

We have no such need at all. Our MSIs are decoded by the PCI host bridge
and directly turned into "normal" interrupt. In fact, in HW, our bridges
contain a special source controller that *is* essentially the thing that
gets hit by MSIs. 

So our MSIs are just normal interrupts in the global space. Their
numbers are assigned by qemu, the kernel never knows about them. When an
emulated device triggers an MSI that turns into a normal "trigger global
interrupt X" ioctl to the kernel. The only "knowledge" the kernel
emulation gets along the way is an argument to the ioctl that indicates
whether this is a level set, level reset, or edge type action (MSIs are
edge obviously) which dictates how the delivery state machine will work
(one shot vs. continuous until cleared).

So qemu assigns interrupt numbers to MSIs and there's never any routing
to establish at the kernel level. That also means that the current API
that has tendrils all the way into devices in qemu for "getting the virq
for a given MSI" is totally unsuitable for us. In fact we don't need a
different API for KVM vs. full emulation. Everything in qemu side is the
same, until the qirq gets actually delivered in which case with KVM
we'll shoot an ioct