On Mon, Jul 24, 2017 at 03:04:05PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2017-07-24 at 13:28 +1000, David Gibson wrote:
> > > So yes, in PAPR there's an "allocator" because the hypervisor will
> > > create a guest "virtual" (or logical to use PAPR terminology) interrupt
> > > number space, in order to represents the various interrupts into the
> > > guest.
> > 
> > Ok, but are each of those logical irqs bound to a specific device/PHB
> > line/whatever, or can they be configured by the guest?
> 
> So for clarity, let's first establish the terminology :-)
> 
>  - HW number is a HW interrupt number on a "bare metal" system or
> powernv guest. For now we will ignore those, they are effectively a
> side effect of how skiboot configure the XIVE and qemu per-se doesn't
> allocate them.
> 
>  - A logical number is a "guest physical" interrupt number for a PAPR
> guest. These fall into roughly 2 categories at the moment:
> 
>     * "interrupts" (or related) properties in the DT, typically
> interrupts for a PCI device, ranges of MSIs etc... that correspond to
> HW sources from a PHB.

Ok, I think this is the one I've mostly been thinking of.

>     * "generic IPIs". Those are ranges of "generic" interrupts that the
> hypervisor gives the guest. On a real system, they correspond to chunks
> allocated off a HW facility for generic interrupts. Generic interrupts
> are the same as normal interrupts from the prespective of
> managing/receiving them, but are "triggered" by an MMIO to a certain HW
> page. There's a DT property telling the guest the interrupt number
> ranges for these guys.
> 
> So that logical number above is what a PAPR guest obtains from the DT
> and uses for the various H-call used to manage and configure interrupt
> sources.

Ok.

> In addition, the XIVE supports renumbering the interrupt number that
> you obtain in the queues. Both bare metal linux, KVM and guests make
> use of this. This only changes the number you observe in a queue when
> you receive an interrupt, it has no effect on the HW number or logical
> number used for the various management calls.

Ok.

> This is used by Linux so that:
> 
>   - On bare metal systems or PAPR guest with "exploitation mode" (ie,
> PAPR guest directly using the XIVE), we put the linux interrupt number
> in there as to avoid the reverse-mapping done by linux otherwise when
> receiving an interrupt.
> 
>   - On PARP guests using the legacy hcalls, KVM configures the logical
> number there.

Ok.

> > > Those numbers however are just tokens, they don't have to represent any
> > > real HW concept. So they can be "allocated" in a rather fixed way, for
> > > example, you could have something like a fixed map where you put all
> > > the PCI interrupts at a certain number (a factor of the PHB# with room
> > > or a fix number per PHB, maybe 16K or so, the HW does 4K max). Another
> > > based would have a chunk of "general purpose" IPIs (for use for actual
> > > IPIs and for other things to come). And a range for the virtual device
> > > interrupts for example. Or you can just use an allocator.
> > 
> > Hm.  So what I'm meaning by an "allocator" is something at least
> > partially dynamic.  Something you say "give me an irq" and it gives
> > you the next available or similar.  As opposed to any mapping from
> > devices to (logical) irqs, which the machine will need to supply one
> > way or another.
> 
> For the sake of repeatability/migration etc... I think a mapping is
> better than an allocator.  IE, a fixed number scheme so that the range
> of interrupts for PHB#x is always a fixed function of x.

Yes, I agree.  In fact that's pretty much exactly the point I'm trying
to make.

Can we assign our logical numbers sparsely, or will that cause other
problems?

Note that for PAPR we also have the question of finding logical
interrupts for legacy PAPR VIO devices.

> We can fix the number of "generic" interrupts given to a guest. The
> only requirements from a PAPR perspective is that there should be at
> least as many as there are possible threads in the guest so they can be
> used as IPIs.

Ok.  If we can do things sparsely, allocating these well away from the
hw interrupts would make things easier.

> But we may need more for other things. We can make this a machine
> parameter with a default value of something like 4096. If we call N
> that number of extra generic interrupts, then the number of generic
> interrutps would be #possible-vcpu's + N, or something like that.

That seems reasonable.

> > > But it's fundamentally an allocator that sits in the hypervisor, so in
> > > our case, I would say in the spapr "component" of XIVE, rather than the
> > > XIVE HW model itself.
> > 
> > Maybe..
> 
> You are right in that a mapping is a better term than an allocator
> here.
> 
> > > Now what Cedric did, because XIVE is very complex and we need something
> > > for PAPR quickly, is not a complete HW model, but a somewhat simplified
> > > one that only handles what PAPR exposes. So in that case where the
> > > allocator sits is a bit of a TBD...
> > 
> > Hm, ok.  My concern here is that "dynamic" allocation of irqs at the
> > machine type level needs extreme caution, or the irqs may not be
> > stable which will generally break migration.
> 
> Yes you are right. We should probably create a more "static" scheme.

Sounds like we're in violent agreement.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature

Reply via email to