________________________________

From: Alexander Graf [mailto:[EMAIL PROTECTED] 
Sent: 2008年6月13日 22:31
To: Xu, Anthony
Cc: Avi Kivity; Marcelo Tosatti; Jes Sorensen; kvm@vger.kernel.org; [EMAIL 
PROTECTED]
Subject: Re: [RFC] kvm irq assignment



On Jun 12, 2008, at 11:38 PM, Xu, Anthony wrote:


        Hi Avi and all
        
        This is the revised one,
        
        All PCI devices send interrupt to both PIC and IOAPIC,  
        a). When PIC is enabled and IOAPIC is disabled,  all redirect entries in
        IOAPIC are masked.
        B) When PIC is disabled and IPAPIC is enabled, link entry bit7 is set,
        means this link entry is disable.
        Guest OS need to guarantee PIC and IOAPIC are not enabled in the same
        time. Otherwise cause many suspicious interrupt to guest.


At boottime the IOAPIC is completely masked, while the PIC is completely 
unmasked. Somewhere during bootup ACPI compliant OSs can call _PIC to switch to 
IOAPIC mode and somehow deactivate the PIC lines and activate IOAPIC lines for 
the devices they are interested in.

I sent a tool called "apicdump" to this list some time ago that allows you to 
read the PIC and IOAPIC from within Linux.


[Anthony] nice tool, I'll try that.



        Test by running guest linux in kvm/ia32 and kvm/ia64.
        
        
        Thanks,
        Anthony
        
        
        
        diff --git a/bios/acpi-dsdt.dsl b/bios/acpi-dsdt.dsl
        index 21fc76a..e12fd66 100755
        --- a/bios/acpi-dsdt.dsl
        +++ b/bios/acpi-dsdt.dsl


ACPI changes look fine to me.


        
        



[snip]


        diff --git a/qemu/hw/apic.c b/qemu/hw/apic.c
        index a14cab2..c3014fa 100644
        --- a/qemu/hw/apic.c
        +++ b/qemu/hw/apic.c
        @@ -1053,9 +1053,25 @@ static void ioapic_service(IOAPICState *s)
            }
        }
        
        +int ioapic_map_irq(int devfn, int irq_num)
        +{
        +    int irq;
        +    irq = ((devfn >> 3) & 7) + 16;
        +    return irq;
        +}
        +#ifdef KVM_CAP_IRQCHIP
        +static int ioapic_irq_count[IOAPIC_NUM_PINS];
        +#endif
        +
        void ioapic_set_irq(void *opaque, int vector, int level)
        {
            IOAPICState *s = opaque;
        +#ifdef KVM_CAP_IRQCHIP
        +    ioapic_irq_count[vector] += level;
        +    if (kvm_enabled())
        + if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
        +    return;
        +#endif
        
            if (vector >= 0 && vector < IOAPIC_NUM_PINS) {
                uint32_t mask = 1 << vector;
        diff --git a/qemu/hw/ipf.c b/qemu/hw/ipf.c
        index b11e328..4761463 100644
        --- a/qemu/hw/ipf.c
        +++ b/qemu/hw/ipf.c
        @@ -672,3 +672,23 @@ QEMUMachine ipf_machine = {
            ipf_init_pci,
            VGA_RAM_SIZE + VGA_RAM_SIZE,
        };
        +
        +#define IOAPIC_NUM_PINS 48
        +static int ioapic_irq_count[IOAPIC_NUM_PINS];
        +
        +int ioapic_map_irq(int devfn, int irq_num)
        +{
        +    int irq, dev;
        +    dev = devfn >> 3;
        +    irq = ((((dev << 2) + (dev >> 3) + irq_num) & 31) + 16);
        +    return irq;
        +}


Why does this look different from the one in apic.c?
[Anthony] it is used for ia64,  I would like to use same guest Firmware of 
XEN/ia64 in KVM/IA64,
I "copy" the mapping from XEN/IA64.




        +
        +void ioapic_set_irq(void *opaque, int vector, int level)
        +{
        +    ioapic_irq_count[vector] += level;
        +    if (kvm_enabled())
        + if (kvm_set_irq(vector, ioapic_irq_count[vector] == 0))
        +    return;
        +}
        +


This does look like common code to me, so it might be a good idea to share it 
with ia32. AFAIK apics are the same on ia32 and ia64.
[Anthony] I wanted to do the same thing as what you said, while currently 
KVM/ia64 doesn't include apic.c file, I tried to add apic.c, but failed to 
complie
I'll try to make it work.




        diff --git a/qemu/hw/pc.h b/qemu/hw/pc.h
        index c284bf1..ef09a78 100644
        --- a/qemu/hw/pc.h
        +++ b/qemu/hw/pc.h
        @@ -47,6 +47,7 @@ int apic_accept_pic_intr(CPUState *env);
        void apic_local_deliver(CPUState *env, int vector);
        int apic_get_interrupt(CPUState *env);
        IOAPICState *ioapic_init(void);
        +int ioapic_map_irq(int devfn, int irq_num);
        void ioapic_set_irq(void *opaque, int vector, int level);
        
        /* i8254.c */
        diff --git a/qemu/hw/pci.c b/qemu/hw/pci.c
        index a23a466..f96fbb5 100644
        --- a/qemu/hw/pci.c
        +++ b/qemu/hw/pci.c
        @@ -27,6 +27,8 @@
        #include "net.h"
        #include "pc.h"
        
        +#include "qemu-kvm.h"
        +
        //#define DEBUG_PCI
        
        struct PCIBus {
        @@ -534,12 +536,18 @@ static void pci_set_irq(void *opaque, int irq_num,
        int level)
            PCIDevice *pci_dev = (PCIDevice *)opaque;
            PCIBus *bus;
            int change;
        -
        +#ifdef KVM_CAP_IRQCHIP
        +    int irq;
        +#endif 
            change = level - pci_dev->irq_state[irq_num];
            if (!change)
                return;
        
            pci_dev->irq_state[irq_num] = level;
        +#ifdef KVM_CAP_IRQCHIP
        +    irq = ioapic_map_irq(pci_dev->devfn, irq_num);
        +    ioapic_set_irq(opaque, irq, change);
        +#endif


So if we're not running KVM we don't trigger the APIC? I don't see how that 
helps anyone. Just modify the Qemu APIC so that it works fine and then hook the 
in-kernel APIC to that.
[Anthony] agree, it will support native Qemu.



            for (;;) {
                bus = pci_dev->bus;
                irq_num = bus->map_irq(pci_dev, irq_num);
        diff --git a/qemu/hw/piix_pci.c b/qemu/hw/piix_pci.c
        index 90cb3a6..96316ca 100644
        --- a/qemu/hw/piix_pci.c
        +++ b/qemu/hw/piix_pci.c
        @@ -225,6 +226,9 @@ static void piix3_set_irq(qemu_irq *pic, int
        irq_num, int level)
            /* now we change the pic irq level according to the piix irq
        mappings */
            /* XXX: optimize */
            pic_irq = piix3_dev->config[0x60 + irq_num];
        +    /* if bit7 set 1, this link is disabled */
        +    if (pic_irq & 0x80)  
        +        return;


Uhm ... I do remember having seen that somewhere else. Why do you need to add 
it now?
[Anthony] As Avi pointed out, below check ( pic_irq < 16) already covers this 
check.

Thanks
Anthony



            if (pic_irq < 16) {
                /* The pic level is the logical OR of all the PCI irqs mapped
                   to it */
        
        
        
        
        
        Xu, Anthony wrote:
        

                Avi Kivity wrote:
                

                        Xu, Anthony wrote:
                        

                                Hi all,
                                

                                Thanks for your comments.
                                


                                I made this new patch based on your comments
                                


                                1. use bimodal _PRT, to take advantage of 
IOAPIC pin 16~23
                                

                                the mapping is simple,  slot  ->  (slot&7)+16 
IOAPIC pin,
                                

                                someone may provide good mapping ?
                                



                        I think it's fine. If we find a better one later, or if 
we add
                        

                        another ioapic, we can easily change it since the bios 
and qemu are
                        

                        shipped as a unit. 
                        


                                2. use ISA-bridge configure space 0x64 byte as 
a communication
                                

                                mechansim. When guest BIOS invokes _PIC, the 
value is passed to
                                

                                            qemu through byte 0x64. qemu know 
whether it is PIC
                                

                                mode and APIC mode by checking byte 0x64. 
                                

                                3. pci_slot_get_pirq and piix3_set_irq adopt 
different operation
                                

                                based on PIC mode/APIC mode 
                                



                        I'm not sure how real hardware works, but I _think_ 
that it routes
                        

                        irqs unconditionally to both the legacy path and 
directly to the
                        

                        ioapic. So for example if slot 5 asserts an interrupt, 
we map it
                        

                        through the pci link mapping and generate an active 
high interrupt to
                        

                        one of {5, 10, 11} (both pic and ioapic), and 
simultaneously an
                        

                        active low interrupt to ioapic pin 21.
                        

                I think what you described is correct.
                




                        The _PIC method should disable the link interrupts if 
ioapic mode is
                        

                        disabled.
                        

                Typo!  If ioapic mode is enabled.
                


                From x86 BIOS, OS disable link interrupt through link device 
_DIS
                

                mothod. 
                




                        This removes the need for communication between the 
bios and qemu.
                        

                        Agree 
                        




                                +            /* APIC and PIC flag */
                                

                                +            OperationRegion (P40D, PCI_Config, 
0x64, 0x01) +
                                



                        This is actually SERIRQC, serial irq control.
                        


                                +
                                

                                +#ifdef KVM_CAP_IRQCHIP
                                



                        This should be unconditional.
                        


                                +static int pci_slot_get_pirq(PCIDevice 
*pci_dev, int irq_num) +{ +
                                

                                int slot_addend; +    if( 
piix3_dev->config[0x64])  // APIC mode
                                

                                +        return ((pci_dev->devfn >> 3) & 7)+16;
                                

                                +    else { // PIC mode
                                

                                +        slot_addend = (pci_dev->devfn >> 3) - 
1;
                                

                                +        return (irq_num + slot_addend) & 3;
                                

                                +    }
                                

                                +}
                                



                        What I'm suggesting is to "fork" the interrupt into two 
lines, one
                        

                        legacy path and the ioapic path.
                        


                I'll try this way.
                


                Anthony
                


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to