[Qemu-devel] Re: [PATCH] sparc32 irq clearing (guest Solaris performance+NetBSD) fix

Blue Swirl Mon, 16 Nov 2009 08:14:58 -0800

On Mon, Nov 16, 2009 at 1:47 PM, Artyom Tarasenko
<atar4q...@googlemail.com> wrote:
> 2009/11/15 Blue Swirl <blauwir...@gmail.com>:
>> On Sun, Nov 15, 2009 at 1:15 AM, Artyom Tarasenko
>> <atar4q...@googlemail.com> wrote:
>>> 2009/11/14 Blue Swirl <blauwir...@gmail.com>:
>>>> On Sat, Nov 14, 2009 at 3:03 AM, Artyom Tarasenko
>>>> <atar4q...@googlemail.com> wrote:
>>>>> According to NCR89C105 documentation
>>>>> http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR89C105.txt
>>>>>
>>>>> Interrupts are cleared by disabling and then re-enabling them.
>>>>> This patch implements the specified behaviour. The most visible effects:
>>>>
>>>> I think the current version also implements this behaviour. The
>>>> difference is that now we clear on disable, with your version, the
>>>> interrupts are cleared when re-enabling them.
>>>
>>> Doesn't this imply that the current version does not implement this
>>> ("Interrupts are cleared by disabling and then re-enabling them") behavior? 
>>> ;-)
>>
>> The specification only says that the sequence disable-enable clears
>> interrupts, but not which of these is true:
>> - clearing happens in the moment of disabling (and interrupts after
>> that are not cleared, current version)
>> - clearing happens  in the moment of re-enabling (your version, sort of)
>> - clearing happens in both cases (lose interrupts)
>
> English is not my native language, but fwiw I think "and then
> re-enabling" can only be the second variant. Without "then" it could
> be either one or three. And if the first variant is what they really
> meant, the part with "and then" is totally redundant and misleading.


Still, this is user documentation, not implementation specification.
I'm open to both versions, if they work.

>> It's also interesting to think what happens between the interrupt
>> controller and the devices. Clearing an interrupt at controller level
>> does not clear the interrupt condition at the device. Aren't the
>> interrupts level triggered on Sparc, so the interrupt is still posted?
>> Is the interrupt actually masked by clearing until the level is
>> deactivated?
>
> Looks unlikely to me. In the book "Panic! Unix system crash dump
> analysis" the authors write that the first thing interrupt handler has
> to do is disable the interrupt, and yes wrting "unix" they mean
> "SunOS/Solaris".
>
> That's also what I observe debugging the Solaris kernel code
> (Solaris kernel debugger is a really powerful tool).
> Looks like interrupts can be shared between devices, so the general
> handler disables the interrupt and then calls multiple device-specific
> handlers sequentially and checks if any of then claims the interrupt.
> If no one does it writes the message "Spurious interrupt %d\n".
>
>
>> Or maybe the controller latches the interrupt so that even after the
>> device releases the interrupt line, interrupt is still active towards
>> the CPU. Then the clearing would make sense.
>
> Looks very realistic to me. I think that's the way the interrupts are
> handled at least under x86.

It's a must on x86, because the interrupts are edge triggered.

>> This does not match current code.
>>
>>>> With the current version, the interrupts which arrived after getting
>>>> cleared during disable but before re-enabling become visible after
>>>> re-enabling. It looks like esp driver in Aurora 1.0, 2.0 and 2.1
>>>> depend on this.
>>>
>>> Hm. It certainly makes sense. But I don't see how does it agree with
>>> the NCR89C105 docs. Can it be the documentation is not precise?
>>>
>>>>> - NetBSD 1.3.3 - 1.5.3 boots successfully
>>>>
>>>> I didn't find those even on ftp.netbsd.org, the first I have is 1.6.
>>>> Thus I could not test if any of the changes I did had any positive
>>>> effect except if some test failed.
>>>
>>> Have you looked in the NetBSD-archive at ftp.netbsd.org?
>>> ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-archive/
>>
>> Yes, but there are no ISOs.
>
> Don't have ISOs either. I think they did never exist for the early
> versions. Miniroots can be used.
> One nice [missing] feature of OpenBIOS is that it doesn't check
> whether a disk label is valid.
> The disk labels of NetBSD miniroots are invalid, so it's not possible
> to boot them on a real hw,
> or under OBP (it drove me crazy when I was fixing the scsi layer to
> work with OBP because
> I thought miniroots were hdd images).
> But, under OpenBIOS it works like a charm:
>
> qemu-system-sparc -nographic  -hda miniroot-133.fs -m 64

Thanks, now I can try something.

>>>>> - Solaris 2.5.1 - 7 boots ~1500 times faster (~20 seconds instead of ~8 
>>>>> hours)
>>>>>
>>>>> Signed-off-by: Artyom Tarasenko <atar4q...@gmail.com>
>>>>> ---
>>>>> diff --git a/hw/slavio_intctl.c b/hw/slavio_intctl.c
>>>>> index 9680392..779c661 100644
>>>>> --- a/hw/slavio_intctl.c
>>>>> +++ b/hw/slavio_intctl.c
>>>>> @@ -177,19 +177,19 @@ static void slavio_intctlm_mem_writel(void
>>>>> *opaque, target_phys_addr_t addr,
>>>>>     saddr = addr >> 2;
>>>>>     DPRINTF("write system reg 0x" TARGET_FMT_plx " = %x\n", addr, val);
>>>>>     switch (saddr) {
>>>>> -    case 2: // clear (enable)
>>>>> +    case 2: // clear (enable, clear formerly disabled pending)
>>>>>         // Force clear unused bits
>>>>>         val &= MASTER_IRQ_MASK;
>>>>> +        s->intregm_pending &= (s->intregm_disabled & val);
>>>>
>>>> This looks buggy, the AND operation will clear a lot of bits unrelated
>>>> to val. I think you are missing a ~ here. I tried a few combinations,
>>>> but none of them passed my tests.
>>>
>>> My bad. It had to be s->intregm_pending &= ~(s->intregm_disabled &
>>> val) . But unfortunately it doesn't work.
>>>
>>>>>         s->intregm_disabled &= ~val;
>>>>>         DPRINTF("Enabled master irq mask %x, curmask %x\n", val,
>>>>>                 s->intregm_disabled);
>>>>>         slavio_check_interrupts(s, 1);
>>>>>         break;
>>>>> -    case 3: // set (disable, clear pending)
>>>>> +    case 3: // set (disable, do not clear pending)
>>>>>         // Force clear unused bits
>>>>>         val &= MASTER_IRQ_MASK;
>>>>>         s->intregm_disabled |= val;
>>>>> -        s->intregm_pending &= ~val;
>>>>
>>>> Here
>>>> s->intregm_pending &= ~s->intregm_disabled;
>>>> would also pass my tests. Does that change anything?
>>>
>>> No, doesn't work for me either.
>>>
>>> Also I put some additional printfs into sun4m.c and see that
>>> interrupts are both set while being already set, and reset while not being 
>>> set.
>>> Looks like a bug?
>>>
>>> static void cpu_set_irq(void *opaque, int irq, int level)
>>> {
>>>    CPUState *env = opaque;
>>>
>>>    if (level) {
>>>        DPRINTF("Raise CPU IRQ %d\n", irq);
>>>        env->halted = 0;
>>>        if(env->pil_in & (1 << irq)) {printf("already set: 
>>> %d\n",irq);return;}
>>>        env->pil_in |= 1 << irq;
>>>        cpu_check_irqs(env);
>>>    } else {
>>>        DPRINTF("Lower CPU IRQ %d\n", irq);
>>>        if(~(env->pil_in & (1 << irq))) {printf("already low:
>>> %d\n",irq);return;}
>>>        env->pil_in &= ~(1 << irq);
>>>        cpu_check_irqs(env);
>>>    }
>>> }
>>>
>>
>> It should be harmless, though it may suck performance.
>
> Do you mean the check against old_interrupt in the cpu_check_irqs()?
> To me it looks harmless only in the case when we have screaming
> interrupts of the same level. Or do you mean something else?

If the old level is unchanged, there should be no action. What's the
case where you see problems?

> The problem is, with "return" after "pritnf" above at least Linux,
> NetBSD and OBP
> hang (or booting so slow that it looks like they hang).
>
> It looks like something is depending on this broken behaviour.

[Qemu-devel] Re: [PATCH] sparc32 irq clearing (guest Solaris performance+NetBSD) fix

Reply via email to