Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-25 Thread Craig Boston
On Wed, Jan 25, 2006 at 08:04:07AM -0700, Scott Long wrote:
> Either that, or the read imposes enough delay to let whatever was
> happening during the DELAY call work.   I find it hard to believe that
> uncached writes would get delayed like this.  I've lost the original
> posting on this, could you provide the dmesg and computer make/model
> again?

It's a Toshiba Satellite L25-S1192.  The chipset is ATI Radeon Xpress
200M (RS480).

Verbose dmesgs are up at http://www.gank.org/freebsd/l25

acpi+apic.txt is a 6.0-RELEASE GENERIC kernel (before I upgraded the
memory, but the APIC thing is independent of that)

apic2.txt is a verbose dmesg with my current kernel (stock 6.0-STABLE +
read-after-write change to local_apic.c).

Craig
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-25 Thread Scott Long

John Baldwin wrote:

On Tuesday 24 January 2006 19:34, Craig Boston wrote:


On Tue, Jan 24, 2006 at 10:43:49AM -0500, John Baldwin wrote:


What if you do a read of the lapic before the write?  Maybe doing 'x =
lapic->eoi;  lapic->eoi = 0;'?


Reading the lapic before the write has no effect.

Reading the lapic after the write makes it work.



Hmm, perhaps the read forces the write to post?  Scott?



Either that, or the read imposes enough delay to let whatever was
happening during the DELAY call work.   I find it hard to believe that
uncached writes would get delayed like this.  I've lost the original
posting on this, could you provide the dmesg and computer make/model
again?

Scott
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-25 Thread John Baldwin
On Tuesday 24 January 2006 19:34, Craig Boston wrote:
> On Tue, Jan 24, 2006 at 10:43:49AM -0500, John Baldwin wrote:
> > What if you do a read of the lapic before the write?  Maybe doing 'x =
> > lapic->eoi;  lapic->eoi = 0;'?
>
> Reading the lapic before the write has no effect.
>
> Reading the lapic after the write makes it work.

Hmm, perhaps the read forces the write to post?  Scott?

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-24 Thread Craig Boston
On Tue, Jan 24, 2006 at 10:43:49AM -0500, John Baldwin wrote:
> What if you do a read of the lapic before the write?  Maybe doing 'x = 
> lapic->eoi;  lapic->eoi = 0;'?

Reading the lapic before the write has no effect.

Reading the lapic after the write makes it work.

Craig
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-24 Thread John Baldwin
On Monday 23 January 2006 21:25, Craig Boston wrote:
> On Fri, Jan 20, 2006 at 03:42:21PM -0500, John Baldwin wrote:
> > On Thu, Jan 19, 2006 at 10:17:39PM -0700, Scott Long wrote:
> > > This points to a bus coherency problem.  I wonder if your BIOS is
> > > incorrectly setting the memory region of the apics as cachable.  You'll
> > > want to bug Baldwin about this.
> >
> > Hmm, well, you can actually try the PAT patch if you are feeling brave as
> > it maps all devices (including APICs) as uncacheable.
>
> Tried the updated PAT patch (with s/pmap_unmapbios/pmap_unmap_bios/ to
> get ACPI to compile).  Unfortunately if it is a caching problem, PAT
> isn't able to fix it.  Same result as stock kernel -- interrupts stop
> arriving after a dozen or so.  AFAICT the local APIC is the only
> memory-mapped I/O region that seems to be problematic.

Ok.

> Instead of writing the value twice, I also tried inserting an
> __asm("nop") before the write with no effect.  Also, a single write to
> an unrelated area doesn't help:
>
> +static volatile int dummyeoi;
> +
>  lapic_eoi(void)
>  {
>
> + dummyeoi = 1;
>   lapic->eoi = 0;
> + dummyeoi = 2;
>  }
>
> I'm _reasonably_ certain that marking dummyeoi volatile and leaving it
> uninitialized will prevent gcc from optimizng that out.  Forcing R/W
> cycles (++dummyeoi) before and after doesn't work either.
>
> A DELAY(1) before the lapic->eoi write does the trick, but DELAY does
> lots of complicated things so I don't know how useful of a data point
> that is.
>
> I'm probably missing something, but if bad cache behavior was causing
> writes to the lapic EOI register to not always take effect, wouldn't the
> _next_ irq (even if it's a different line) cause the one that's
> currently pending to be acknowledged?

What if you do a read of the lapic before the write?  Maybe doing 'x = 
lapic->eoi;  lapic->eoi = 0;'?

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-23 Thread Craig Boston
On Fri, Jan 20, 2006 at 03:42:21PM -0500, John Baldwin wrote:
> On Thu, Jan 19, 2006 at 10:17:39PM -0700, Scott Long wrote:
> > This points to a bus coherency problem.  I wonder if your BIOS is
> > incorrectly setting the memory region of the apics as cachable.  You'll
> > want to bug Baldwin about this.
> 
> Hmm, well, you can actually try the PAT patch if you are feeling brave as it 
> maps all devices (including APICs) as uncacheable.

Tried the updated PAT patch (with s/pmap_unmapbios/pmap_unmap_bios/ to
get ACPI to compile).  Unfortunately if it is a caching problem, PAT
isn't able to fix it.  Same result as stock kernel -- interrupts stop
arriving after a dozen or so.  AFAICT the local APIC is the only
memory-mapped I/O region that seems to be problematic.

Instead of writing the value twice, I also tried inserting an
__asm("nop") before the write with no effect.  Also, a single write to
an unrelated area doesn't help:

+static volatile int dummyeoi;
+
 lapic_eoi(void)
 {

+   dummyeoi = 1;
lapic->eoi = 0;
+   dummyeoi = 2;
 }

I'm _reasonably_ certain that marking dummyeoi volatile and leaving it
uninitialized will prevent gcc from optimizng that out.  Forcing R/W
cycles (++dummyeoi) before and after doesn't work either.

A DELAY(1) before the lapic->eoi write does the trick, but DELAY does
lots of complicated things so I don't know how useful of a data point
that is.

I'm probably missing something, but if bad cache behavior was causing
writes to the lapic EOI register to not always take effect, wouldn't the
_next_ irq (even if it's a different line) cause the one that's
currently pending to be acknowledged?

Craig
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-23 Thread John Baldwin
On Friday 20 January 2006 16:26, Craig Boston wrote:
> On Fri, Jan 20, 2006 at 03:42:21PM -0500, John Baldwin wrote:
> > Hmm, well, you can actually try the PAT patch if you are feeling brave as
> > it maps all devices (including APICs) as uncacheable.
>
> Heh, took me a minute to find.  I first found the one at
> http://people.freebsd.org/~jhb/patches/pat.patch
> but it maps devices as write-back.  I'm guessing you mean to use the
> version in perforce?

Yeah, I need to generate an updated patch.
-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-20 Thread Craig Boston
On Fri, Jan 20, 2006 at 03:42:21PM -0500, John Baldwin wrote:
> Hmm, well, you can actually try the PAT patch if you are feeling brave as it 
> maps all devices (including APICs) as uncacheable.

Heh, took me a minute to find.  I first found the one at
http://people.freebsd.org/~jhb/patches/pat.patch
but it maps devices as write-back.  I'm guessing you mean to use the
version in perforce?

I'll give it a try tonight.  Could hardy make things worse -- I just
noticed that X now randomly locks up hard, ever since I bumped up the
memory from 256Mb to 2G -- though text mode still works fine. (yes, I
tried reverting all my local patches and testing the memory)

Craig
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-20 Thread John Baldwin
On Friday 20 January 2006 10:27, Craig Boston wrote:
> On Thu, Jan 19, 2006 at 10:17:39PM -0700, Scott Long wrote:
> > This points to a bus coherency problem.  I wonder if your BIOS is
> > incorrectly setting the memory region of the apics as cachable.  You'll
> > want to bug Baldwin about this.
>
> I CC-ed him on my post since he was working with me on the problem
> before.  For some reason the Cc: header got wiped out when it went to
> the list (but I checked my server logs and it did deliver a copy of the
> message to him).

Hmm, well, you can actually try the PAT patch if you are feeling brave as it 
maps all devices (including APICs) as uncacheable.

-- 
John Baldwin <[EMAIL PROTECTED]>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-20 Thread Craig Boston
On Thu, Jan 19, 2006 at 10:17:39PM -0700, Scott Long wrote:
> This points to a bus coherency problem.  I wonder if your BIOS is
> incorrectly setting the memory region of the apics as cachable.  You'll
> want to bug Baldwin about this.

I CC-ed him on my post since he was working with me on the problem
before.  For some reason the Cc: header got wiped out when it went to
the list (but I checked my server logs and it did deliver a copy of the
message to him).

Craig
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: Weird PCI interrupt delivery problem (resolution, sort of)

2006-01-19 Thread Scott Long

Craig Boston wrote:


After trying everything I could think of to do to the I/O APIC code and
coming up empty, tonight I went back to the local APIC.  I had
previously ruled it out since the lapic timer interrupt continued to
work fine even when the others stopped.  However, adding some DELAY(1)
calls at key points caused it to work, much like adding WITNESS does.
I managed to get it down to a single change that makes APIC mode work on
this laptop:

--- local_apic.c.orig   Thu Jan 19 18:32:37 2006
+++ local_apic.cThu Jan 19 18:32:28 2006
@@ -599,4 +599,5 @@
 lapic_eoi(void)
 {
lapic->eoi = 0;
+   lapic->eoi = 0;
 }

...and welcome to bizarro world.  There's absolutely no reason I can
think of why that would change anything, other than buggy hardware.

I looked at what Linux was doing, and they're also using a single write
to EOI interrupts, so long as the X86_GOOD_APIC config option is enabled
(and it is for P5/MMX or newer).  Otherwise it does an extra read before
writing to any APIC register.  I don't know if linux works on this
hardware or not -- the live CD I tried wasn't compiled for APIC support.

At this point, since AFAIK nobody else has reported the same problem,
I'm content with a local workaround.  It's just... wierd.

Craig


This points to a bus coherency problem.  I wonder if your BIOS is
incorrectly setting the memory region of the apics as cachable.  You'll
want to bug Baldwin about this.

Scott

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"