Re: in-kernel interrupt controller steering

Alexander Graf Wed, 06 Mar 2013 08:35:06 -0800


Am 06.03.2013 um 16:30 schrieb Gleb Natapov <g...@redhat.com>:


> On Wed, Mar 06, 2013 at 03:48:54PM +0100, Alexander Graf wrote:
>> 
>> On 06.03.2013, at 15:41, Gleb Natapov wrote:
>> 
>>> On Wed, Mar 06, 2013 at 03:03:53PM +0100, Alexander Graf wrote:
>>>> 
>>>> On 06.03.2013, at 14:56, Gleb Natapov wrote:
>>>> 
>>>>> On Wed, Mar 06, 2013 at 02:22:15PM +0100, Alexander Graf wrote:
>>>>>> 
>>>>>> On 06.03.2013, at 14:14, Gleb Natapov wrote:
>>>>>> 
>>>>>>> On Wed, Mar 06, 2013 at 01:20:39PM +0100, Alexander Graf wrote:
>>>>>>>>> The problem would only start if KVM_SET_IRQCHIP_TYPE (new name of
>>>>>>>>> KVM_CREATE_IRQCHIP_ARGS) forced you to later call KVM_CREATE_DEVICE.
>>>>>>>> 
>>>>>>>> Ah, I see. I don't see why it would. The fact that there is a "LAPIC" 
>>>>>>>> doesn't mean that the per-vcpu SET_INTERRUPT ioctl stops working. So 
>>>>>>>> if SET_IRQCHIP_TYPE(!none) breaks user-space interrupt controller 
>>>>>>>> emulation I would consider that a bug.
>>>>>>> For x86 this is the case though. I do not see how it can't be. If
>>>>>>> LAPIC is emulated in userspace SET_INTERRUPT is used to pass IRQ
>>>>>>> vector that should be handled as a result of LAPIC emulation.
>>>>>> 
>>>>>> So SET_INTERRUPT on a vcpu triggers a line on the LAPIC emulation in 
>>>>>> that vcpu? For us it directly controls the CPU interrupt pin.
>>>>> No SET_INTERRUPT on a vcpu tells vcpu to which vector in IDT it needs to
>>>>> jump immediately. LAPIC is really part of a cpu and we cut it and put into
>>>>> userspace, so interface between userspace LAPIC emulation is really low
>>>>> level and has to be synchronous. X86 has two interrupt lines NMI and INTR
>>>>> and we do not have interface to trigger the later.  KVM_IRQ_LINE works on
>>>>> GSI lines which do not go into CPU directly. They go either via PIC (which
>>>>> triggers INTR or APIC LINT0) or via IOAPIC which on real HW communicates
>>>>> with APICs via bus, but in our emulation just calls APICs directly.
>>>> 
>>>> Great :). It's similar for us. SET_INTERRUPT directly asserts the INTR 
>>>> line of the vcpu. There is nothing like an IDT on PPC, so external 
>>>> interrupts simply arrive at a specific vector. That vector can differ for 
>>>> critical or NMI interrupts IIRC, but I'm not sure we implement that right 
>>>> now. If so, it'd be a different line for SET_INTERRUPT.
>>>> 
>>>> So in a way, it's the same. And SET_INTERRUPT should work regardless of 
>>>> whether a LAPIC is used or not really. At least it would for us :).
>>> Is it possible for some devices to inject interrupt directly and other
>>> to go through interrupt controller?
>> 
>> It would be racy if both assert + deassert the same line, but I don't see 
>> why we should keep anyone from doing it. If user space wants to run such a 
>> configuration, it needs to ensure that only one of the 2 is actively used at 
>> any given time.
>> 
>>>> KVM_IRQ_LINE is basically an IOAPIC interrupt line assert. That's fine. 
>>>> That ioctl should get an ioapic device handle to work on. Whether we call 
>>>> the IOAPIC PINs GSIs or something different is really just a naming 
>>>> question. I'd probably call it IRQ number :).
>>> Yes and no. On sane archs we can call it IRQ number (lucky you!), but on
>>> X86 there is a GSI that can be IRQ2 if it goes through IOAPIC and IRQ0
>>> if it goes through PIC, so additional entity was invented: irq routing.
>>> It maps between GSI and irqchips pin. Same GSI may go to more than one
>>> irqchip. This is why for x86 having irqchip device handle as a parameter
>>> to KVM_IRQ_LINE does not make sense. It make sense to provide it to irq
>>> router and this is how it work now except that "device handlers" are
>>> hard coded.
>> 
>> Then you would create a new "irq router" device that does the multiplexing 
>> and can also receive IRQs. You could then directly assert an IOAPIC/PIC line 
>> or a multiplexer line. Or am I misunderstanding something?
> The usefulness of such flexibility is questionable, but you are right, it can 
> be implemented this way.
> 
>>> 
>>>> But it's the same idea. The "IOAPIC" would then talk to to in-kernel 
>>>> "LAPIC" style bits (or in case of the MPIC just integrate them inside of 
>>>> itself). That's why by the time we create an "IOAPIC", the "LAPIC"s in the 
>>>> system have to be populated.
>>> The restriction that LAPIC has to be created before IOAPIC would be a
>>> bug that need to be fixed on X86. The reason is cpu hotplug. If you have
>>> to support cpu hotplug you have to be able to create LAPICs after IOAPIC
>>> and at this point you can create IOAPIC before any LAPICs as well. I
>>> understand this may not be the case for all architectures right now, but
>>> something to keep in mind.
>> 
>> Paul, Scott, do you think we can move the "this CPU can receive interrupts 
>> from MPIC / XICS" part into an ENABLE_CAP that gets set dynamically? That 
>> ENABLE_CAP would allocate the structures in the vcpu and register the vcpu 
>> with the interrupt controller pool.
>> 
>> The interrupt controller device would still iterate through all vcpus to 
>> find the ones that match so that we support the ENABLE_CAP at any point in 
>> time.
>> 
>>> 
>>>> 
>>>> So again, I'm failing to see where we think differently :).
>>> The difference is very minor really. I still try to justify to myself
>>> why we need separate ioctl() to announce what irqchip we are going to
>>> create before creating one (except save QEMU some troubles). The question
>>> is: is this ioctl can be useful by itself? Seems like unlikely scenario
>>> that we will allow IOAPIC/PIC emulation in uesrspace while LAPIC is in
>>> kernel may be such case. QEMU will call it before creating vcpus to
>>> tell KVM that LAPICs need to be created along with VCPUs, but no
>>> irqchip will be created.
>> 
>> I don't have a real answer for you yet, but so far the general design mantra 
>> of "small, individual pieces that plug together" worked out way better for 
>> us than the "have one call that does it all" one. Being explicit simply 
>> makes sure that we support more scenarios we don't think of today.
> Suppose we are going with the IRQ_CHIP_ARCH ioctl. What happens if
> userspace calls ioctl(IRQ_CHIP_ARCH, MPIC) and tries to call KVM_RUN
> before creating MPIC device?

User space can't access the MPIC :). So it has to do SET_INTERRUPT on vcpus 
like it does when it doesn't set the irq arch.

Alex

> 
> --
>            Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: in-kernel interrupt controller steering

Reply via email to