Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-28 Thread Stefan Richter
Robert Crocombe wrote:
> On 11/27/06, Stefan Richter <[EMAIL PROTECTED]> wrote:
>> Posted writes are still enabled. phys_dma=0 disables only the physical
>> response unit.
...
> What I need is for write requests directed to
> address 0 to be directed to the asynchronous unit so that I can treat
> them as regular asynchronous write requests. 
...
> So long ways round, I think the phys_dma parameter is the proper thing
> for me.

Yes, correct.

(Leaving posted writes on has two subtle effects which may or may not be
interesting to you. 1. Write transactions will be performed as unified
transactions. 2. The transaction is already complete before the
controller writes to main memory. There are devices out there which
behave unexpectedly with unified transactions on, and some which do so
if they are off.)

> And I will try and do some actual thinking about what is happening.  I
> was hoping to offload that work to you and simply perform mechanical
> changes to the source!  Rats!

I'm on it, but working slowly, as usual. (I thought I get something
together during your holidays, but I fought with other buggy software
which crippled my main PC...)
-- 
Stefan Richter
-=-=-==- =-== ===--
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-28 Thread Robert Crocombe

On 11/27/06, Stefan Richter <[EMAIL PROTECTED]> wrote:

Posted writes are still enabled. phys_dma=0 disables only the physical
response unit. You have to change the source if you want to disable
posted writes. See the top of ohci_initialize. Should this be a module
load parameter too?


Er.  I misspoke.  What I need is for write requests directed to
address 0 to be directed to the asynchronous unit so that I can treat
them as regular asynchronous write requests.  As the OHCI 1.1 spec
says:

"Physical requests that are rejected by the PhysicalRequestFilter
shall be sent to the AR Request DMA context if the AR Request DMA
context is enabled". (5.14.2, page 58)

That does appear to be happening: I have an ARM mapping set to begin
at 0 and extend some ways along, and I do receive write requests.  At
first I was simply changing the lines:

reg_write(ohci,OHCI1394_PhyReqFilterHiSet, 0x);
reg_write(ohci,OHCI1394_PhyReqFilterLoSet, 0x);

to be 0x  instead, but then I paid more attention to the
source and saw the phys_dma parameter, which does the same.  Well,
*did*, in 2.6.16.  I see that 2.6.18 doesn't write 0 if !phys_dma, it
just leaves the values alone, but I guess that's okay since they are
set to 0 on reset.  Same difference.

So that's okay.  Uhm, mostly.  You should really see the horrors I
have created in order to be able to have 5 hosts map the same address
range (the custom protocol we're using doesn't use the destination
address at all, so it's 0 for everybody).

So long ways round, I think the phys_dma parameter is the proper thing for me.

And I will try and do some actual thinking about what is happening.  I
was hoping to offload that work to you and simply perform mechanical
changes to the source!  Rats!

--
Robert Crocombe
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Stefan Richter
Robert Crocombe wrote:
...
> Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: IntEvent: 00020010

busReset + RQPkt (packet sent)

...
> Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: IntEvent: 0001

selfIDcomplete

...
> Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: IntEventClear
>  IntEventSet   04508000 IntMaskSet838301f3

IntEventSet = phyRegRcvd + cycleLost + cycleSynch + selfIDcomplete2

> Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: IntEvent: 00020010
...
> Nov 27 13:06:41 spanky kernel: ohci1394: fw-host1: IntEvent: 0001
...
> Nov 27 13:06:44 spanky kernel: ohci1394: fw-host1: IntEventClear
>  IntEventSet   6ffdc33f IntMaskSet

IntEventSet = vendorSpecific + softInterrupt + ack_tardy + phyRegRcvd +
cycleTooLong + unrecoverableError + cycleInconsistent + cycleLost +
cycle64Seconds + cycleSynch + phy + regAccessFail + selfIDComplete +
selfIDComplete2 + [reserved...?] + lockRespErr + postedWriteErr + RSPkt
+ RQPkt + ARRS + ARRQ + respTxComplete + reqTxComplete

(The bit 4000 is not standardized and shouldn't be there. Whatever.)

That's a lot of stuff that happened right before this last print macro.
Maybe ages were spent in hpsb_selfid_complete. I will try to work on
getting non-critical parts out of hpsb_selfid_complete issue during the
week, but I don't know how fast I can do this and if this will help in
the first place.

...
> I didn't mention that I have:
> 
> options ieee1394 disable_nodemgr=1
> options ohci1394 phys_dma=0
> 
> in my /etc/modprobe.conf.  The Linux adapters are functioning as
> simulated peripherals to a piece of control hardware that always has a
> dest address of 0x   on all packets so I needed to get rid
> of posted writes and any bickering over bus master.

Posted writes are still enabled. phys_dma=0 disables only the physical
response unit. You have to change the source if you want to disable
posted writes. See the top of ohci_initialize. Should this be a module
load parameter too?
-- 
Stefan Richter
-=-=-==- =-== ==-==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Robert Crocombe

Robert Crocombe wrote:

this is in 2.6.16-rt29 which has proved to be the easiest to provoke.
I actually couldn't get 2.6.18 to break earlier this morning (few
hundred resets).


Okay, I got the problem to occur again with 2.6.18.  I will attach my
config in case you wish to scrutinize for any boneheadedness on my
part.

I provoked the problem both with and without the additional read of
IntMaskSet.  Amazingly, I lost host1 on the bus reset that occured
after this sequence:

rmmod ohci1394
rmmod ieee1394
make
make modules_install
modprobe ohci1394

which followed my adding the extra register read line.  Here's the
entirety of the host1 stuff (I did a s/.*host[^1].*//g in vim).  I
snipped some of the self ID chatter.

Nov 27 13:06:35 spanky kernel: ieee1394: nodemgr and IRM functionality disabled
Nov 27 13:06:35 spanky kernel: ohci1394: fw-host1: Remapped memory
spaces reg 0xc2058000
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Soft reset finished
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Iso contexts reg:
00a8 implemented: 000f
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Iso contexts reg:
0098 implemented: 00ff
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Receive DMA ctx=0 initialized
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Receive DMA ctx=0 initialized
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Transmit DMA ctx=0
initialized
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: Transmit DMA ctx=1
initialized
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: physUpperBoundOffset=
Nov 27 13:06:36 spanky kernel: ohci1394: fw-host1: OHCI-1394 1.1
(PCI): IRQ=[98]  MMIO=[f9ffe000-f9ffe7ff]  Max Packet=[4096]  IR/IT
contexts=[4/8]
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: IntEvent: 00020010
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: irq_handler: Bus
reset requested
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: Cancel request received
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: Got RQPkt interrupt
status=0x8409
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: Single packet rcv'd
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: IntEvent: 0001
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: SelfID interrupt
received (phyid 1, not root)
Nov 27 13:06:37 spanky kernel: ohci1394: fw-host1: SelfID packet
0x807fc494 received
Nov 27 13:06:38 spanky kernel: ohci1394: fw-host1: SelfID packet
0x817fc494 received
Nov 27 13:06:38 spanky kernel: ohci1394: fw-host1: SelfID for this
node is 0x817fc494
Nov 27 13:06:39 spanky kernel: ohci1394: fw-host1: SelfID packet BLAH
...15 more SelfID...
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: SelfID complete
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: PhyReqFilter=
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: IntEventClear
 IntEventSet   04508000 IntMaskSet838301f3
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: IntEvent: 00020010
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: irq_handler: Bus
reset requested
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: Cancel request received
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: Got RQPkt interrupt
status=0x8409
Nov 27 13:06:40 spanky kernel: ohci1394: fw-host1: Single packet rcv'd
Nov 27 13:06:41 spanky kernel: ohci1394: fw-host1: IntEvent: 0001
Nov 27 13:06:42 spanky kernel: ohci1394: fw-host1: SelfID interrupt
received (phyid 1, not root)
Nov 27 13:06:42 spanky kernel: ohci1394: fw-host1: SelfID packet
0x807fc494 received
Nov 27 13:06:42 spanky kernel: ohci1394: fw-host1: SelfID packet
0x817fc496 received
Nov 27 13:06:42 spanky kernel: ohci1394: fw-host1: SelfID for this
node is 0x817fc496
Nov 27 13:06:42 spanky kernel: ohci1394: fw-host1: SelfID packet BLAH
...15 more SelfID...
Nov 27 13:06:43 spanky kernel: ohci1394: fw-host1: SelfID complete
Nov 27 13:06:43 spanky kernel: ohci1394: fw-host1: PhyReqFilter=
Nov 27 13:06:44 spanky kernel: ohci1394: fw-host1: IntEventClear
 IntEventSet   6ffdc33f IntMaskSet

with the bad IntMaskSet again.

I don't know if the host loss when I didn't have the additional read
is meaningful, but there it is simply:

Nov 27 13:04:39 spanky kernel: ohci1394: fw-host2: SelfID packet
0x823fc4f8 rf8c43f8c
.
.
.
Nov 27 13:06:30 spanky kernel: ohci1394: fw-host2: Soft reset finished

with 2 minutes and ~30 bus resets in between.

Oh, poop.  I didn't mention that I have:

options ieee1394 disable_nodemgr=1
options ohci1394 phys_dma=0

in my /etc/modprobe.conf.  The Linux adapters are functioning as
simulated peripherals to a piece of control hardware that always has a
dest address of 0x   on all packets so I needed to get rid
of posted writes and any bickering over bus master.

--
Robert Crocombe
[EMAIL PROTECTED]


2.6.18_00_config.bz2
Description: BZip2 compressed data


MMIO write ordering (was Re: ieee1394: host adapter disappears on 1394 bus reset)

2006-11-27 Thread Stefan Richter
I wrote:
> Question to others:
> 
> ohci1394.c::ohci_irq_handler() is taking a per-host spinlock around some
> register reads and writes, particularly:
> ...
>   spin_lock_irqsave(&ohci->event_lock, flags);
>   event = reg_read(ohci, OHCI1394_IntEventClear);
>   reg_write(ohci, OHCI1394_IntEventClear, event &
>   ~OHCI1394_busReset);
>   spin_unlock_irqrestore(&ohci->event_lock, flags);
> ...
>   spin_lock_irqsave(&ohci->event_lock, flags);
>   reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
>   run_an_insane_loop_as_an_alleged_fix_for_dorky_hardware;
>   spin_unlock_irqrestore(&ohci->event_lock, flags);
> ...
>   spin_lock_irqsave(&ohci->event_lock, flags);
>   reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
>   reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
>   spin_unlock_irqrestore(&ohci->event_lock, flags);
> 
> I think these spinlocks are totally useless 1. because
> ohci_irq_handler() is only called as the hardware interrupt servicing
> routine and 2. because they don't flush the register write operations.
> Right? Wrong? [Ohci1394's reg_write() is a writel().]

Also, what is the status of ordering guarantees --- or lack thereof ---
for writel() under Linux 2.6.16 and 2.6.18? Especially in presence of a
PCI-X to PCI bridge...
-- 
Stefan Richter
-=-=-==- =-== ==-==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Stefan Richter
Robert Crocombe wrote:
> Okay, so the code looks like this now:
> 
> DBGMSG("PhyReqFilter=%08x%08x",
>reg_read(ohci,OHCI1394_PhyReqFilterHiSet),
>reg_read(ohci,OHCI1394_PhyReqFilterLoSet));
> 
> reg_read(ohci, OHCI1394_IntMaskSet);
> 
> hpsb_selfid_complete(host, phyid, isroot);
> 
> DBGMSG( "IntEventClear %08x "
> "IntEventSet %08x "
> "IntMaskSet %08x",
> reg_read(ohci, OHCI1394_IntEventClear),
> reg_read(ohci, OHCI1394_IntEventSet),
> reg_read(ohci, OHCI1394_IntMaskSet));

OK.

> this is in 2.6.16-rt29 which has proved to be the easiest to provoke.
> I actually couldn't get 2.6.18 to break earlier this morning (few
> hundred resets).

You could replace 2.6.16-rt29/drivers/ieee1394/ by drivers/ieee1394/
from 2.6.16.28 or later plus one of the patches from
http://me.in-berlin.de/~s5r6/linux1394/updates/2.6.16.x/ and see if it
makes a difference. But judging from the changes that went in, I would
be surprised if there was any improvement.

> Okay, I've lost host1 (on the Indigita), but this time the last print
> statement is:
> 
> Nov 27 10:38:27 spanky kernel: ohci1394: fw-host1: IntEventClear
>  IntEventSet 04588000 IntMaskSet 818300f3
> 
> just like all the other hosts.  I can confirm that no bus reset
> handlers are called, and there are another 4,000 lines of statements
> from the other hosts after the last from host1.

This is strange. The mask has bus reset and self ID received events
switched on. There is nothing manipulating this mask besides the
interrupt handler and the initialization and shutdown routines. And if
I'm not mistaken, the interrupt handler does not run concurrently to
itself for the same chip.

Ingo et al, is the -rt patched kernel fundamentally different WRT
reentrance of interrupt handlers?
-- 
Stefan Richter
-=-=-==- =-== ==-==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Robert Crocombe

On 11/27/06, Stefan Richter <[EMAIL PROTECTED]> wrote:
But perhaps more importantly, how are the IRQs distributed?

# cat /proc/interrupts


This is almost right after boot.  I generated about 40 bus resets just
to stir things up a little:

  CPU0   CPU1   CPU2   CPU3
 0:  33660  36393  30037  69980IO-APIC-edge  timer
 1:  0  0  1 10IO-APIC-edge  i8042
 8:  0  0  0  0IO-APIC-edge  rtc
 9:  0  0  0  0   IO-APIC-level  acpi
12:  0  0  0113IO-APIC-edge  i8042
15:  0270686215IO-APIC-edge  ide1
50:  1  0  11567  7   IO-APIC-level  aic79xx
58:  0  0  0  0   IO-APIC-level  ehci_hcd:usb1
66:  0  0  0  0   IO-APIC-level  ohci_hcd:usb2
74:  0  1  7 80   IO-APIC-level
ohci1394, ohci1394
82:  7 23 30 28   IO-APIC-level  ohci1394
90:  2 28 17 71   IO-APIC-level  eth0
98:  9 27 21   9182   IO-APIC-level  eth1
106: 19 17 20 26   IO-APIC-level  ohci1394
114: 16 26 34 12   IO-APIC-level  ohci1394
233:  0  0 15  0   IO-APIC-level  aic79xx
NMI:410 78 75 77
LOC: 166733 166657 166542 166432
ERR:  0
MIS:  0

Also:
I couldn't cause the problem when using 4 Fireboard 800s through
several hundred bus resets (usually took <= 40 for the Indigita card)


Please add
reg_read(ohci, OHCI1394_IntMaskSet);
right before hpsb_selfid_complete(host, phyid, isroot);. This will flush
the previous reg_write before hpsb_selfid_complete starts doing
unspeakable things.


Okay, so the code looks like this now:

   DBGMSG("PhyReqFilter=%08x%08x",
  reg_read(ohci,OHCI1394_PhyReqFilterHiSet),
  reg_read(ohci,OHCI1394_PhyReqFilterLoSet));

   reg_read(ohci, OHCI1394_IntMaskSet);

   hpsb_selfid_complete(host, phyid, isroot);

   DBGMSG( "IntEventClear %08x "
   "IntEventSet %08x "
   "IntMaskSet %08x",
   reg_read(ohci, OHCI1394_IntEventClear),
   reg_read(ohci, OHCI1394_IntEventSet),
   reg_read(ohci, OHCI1394_IntMaskSet));

this is in 2.6.16-rt29 which has proved to be the easiest to provoke.
I actually couldn't get 2.6.18 to break earlier this morning (few
hundred resets).

Okay, I've lost host1 (on the Indigita), but this time the last print
statement is:

Nov 27 10:38:27 spanky kernel: ohci1394: fw-host1: IntEventClear
 IntEventSet 04588000 IntMaskSet 818300f3

just like all the other hosts.  I can confirm that no bus reset
handlers are called, and there are another 4,000 lines of statements
from the other hosts after the last from host1.

--
Robert Crocombe
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Stefan Richter
On 11/27/2006 4:39 PM, Robert Crocombe wrote:
> On 11/22/06, Stefan Richter <[EMAIL PROTECTED]> wrote:
>> One thing you could try next is to add a debug logging macro which
>> prints the contents of OHCI1394_IntEventClear, OHCI1394_IntEventSet, and
>> OHCI1394_IntMaskSet, right after ohci1394's call to
>> hpsb_selfid_complete. (I'm merely poking in the dark here.)
> 
> I think you've got something!  I managed to provoke failure from 3 of
> the 5 interfaces in a single burst of reset clicking!  And yes, all 3
> failed interfaces are on the Indigita card, and no, the Fireboard has
> never failed.
> 
> The last thing I see from the failed interfaces is this:
> 
> Nov 27 08:25:51 spanky kernel: ohci1394: fw-host3: 
> PhyReqFilter=
> Nov 27 08:25:51 spanky kernel: ohci1394: fw-host3: IntEventClear
>  IntEventSet 6ffdc33f IntMaskSet 

Zero bits in the mask mean that the chip will not generate a processor
interrupt for the type of event represented by the bit. And the
difference between what can be read from IntEventClear and IntEventSet
is that the former is the masked version of the latter.

You probably noticed yourself that ohci1394's interrupt handler
explicitly enables some bits in the interrupt mask shortly before the
call to hpsb_selfid_complete. This means there is either something going
wrong during hpsb_selfid_complete (which wouldn't surprise me, since
there are busy wait loops involved) or the write access to the interrupt
mask wasn't flushed soon enough.

> which looks very different from the entries by the interfaces that
> survive (these are the lines immediately before the one above)
> 
> Nov 27 08:25:51 spanky kernel: ohci1394: fw-host4: IntEventClear
>  IntEventSet 04508000 IntMaskSet 818300f3
> Nov 27 08:25:51 spanky kernel:
> Nov 27 08:25:51 spanky kernel: ohci1394: fw-host2: IntEventClear
>  IntEventSet 04508000 IntMaskSet 818300f3
> Nov 27 08:25:51 spanky kernel:

This mask looks much better.

> I'm not sure if this says anything to you except "hey, don't use those
> Indigita cards".  The problem is, I can't get the number of ports I
> need using only Fireboards (I think I need 6, and I have 5 PCI slots
> but need to use some of the other slots).
[...]

As you wrote, both cards use the same link layer controller, although
they could have different chip revisions. The controllers of the
Indigita card sit behind the bridge, which /perhaps/ contributes to the
problem. But perhaps more importantly, how are the IRQs distributed?
# cat /proc/interrupts

Anyway, I think a driver problem is more likely the cause than a
potential hardware issue.

Please add
reg_read(ohci, OHCI1394_IntMaskSet);
right before hpsb_selfid_complete(host, phyid, isroot);. This will flush
the previous reg_write before hpsb_selfid_complete starts doing
unspeakable things.

And I should finally start to work on a fix for hpsb_selfid_complete,
i.e. move all the time-consuming but less time-critical parts off into a
tasklet or workqueue job...


Question to others:

ohci1394.c::ohci_irq_handler() is taking a per-host spinlock around some
register reads and writes, particularly:
...
spin_lock_irqsave(&ohci->event_lock, flags);
event = reg_read(ohci, OHCI1394_IntEventClear);
reg_write(ohci, OHCI1394_IntEventClear, event &
~OHCI1394_busReset);
spin_unlock_irqrestore(&ohci->event_lock, flags);
...
spin_lock_irqsave(&ohci->event_lock, flags);
reg_write(ohci, OHCI1394_IntMaskClear, OHCI1394_busReset);
run_an_insane_loop_as_an_alleged_fix_for_dorky_hardware;
spin_unlock_irqrestore(&ohci->event_lock, flags);
...
spin_lock_irqsave(&ohci->event_lock, flags);
reg_write(ohci, OHCI1394_IntEventClear, OHCI1394_busReset);
reg_write(ohci, OHCI1394_IntMaskSet, OHCI1394_busReset);
spin_unlock_irqrestore(&ohci->event_lock, flags);

I think these spinlocks are totally useless 1. because
ohci_irq_handler() is only called as the hardware interrupt servicing
routine and 2. because they don't flush the register write operations.
Right? Wrong? [Ohci1394's reg_write() is a writel().]
-- 
Stefan Richter
-=-=-==- =-== ==-==
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ieee1394: host adapter disappears on 1394 bus reset

2006-11-27 Thread Robert Crocombe

On 11/22/06, Stefan Richter <[EMAIL PROTECTED]> wrote:

One thing you could try next is to add a debug logging macro which
prints the contents of OHCI1394_IntEventClear, OHCI1394_IntEventSet, and
OHCI1394_IntMaskSet, right after ohci1394's call to
hpsb_selfid_complete. (I'm merely poking in the dark here.)


I think you've got something!  I managed to provoke failure from 3 of
the 5 interfaces in a single burst of reset clicking!  And yes, all 3
failed interfaces are on the Indigita card, and no, the Fireboard has
never failed.

The last thing I see from the failed interfaces is this:

Nov 27 08:25:51 spanky kernel: ohci1394: fw-host3: PhyReqFilter=
Nov 27 08:25:51 spanky kernel: ohci1394: fw-host3: IntEventClear
 IntEventSet 6ffdc33f IntMaskSet 

which looks very different from the entries by the interfaces that
survive (these are the lines immediately before the one above)

Nov 27 08:25:51 spanky kernel: ohci1394: fw-host4: IntEventClear
 IntEventSet 04508000 IntMaskSet 818300f3
Nov 27 08:25:51 spanky kernel:
Nov 27 08:25:51 spanky kernel: ohci1394: fw-host2: IntEventClear
 IntEventSet 04508000 IntMaskSet 818300f3
Nov 27 08:25:51 spanky kernel:

I'm not sure if this says anything to you except "hey, don't use those
Indigita cards".  The problem is, I can't get the number of ports I
need using only Fireboards (I think I need 6, and I have 5 PCI slots
but need to use some of the other slots).

Is there further diagnostic poking about that I can do to narrow down
the problem?   Is something for Indigita?  The card is pretty basic: 4
of the TI TSB82AA2 (Ice Lynx) links behind a IBM/Tundra PCI-X bridge.
I have an Intel quad ethernet card that uses the exact same part
(well, one rev older, actually).  Here's a chunk of my lspci for
completeness sake:

01:04.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 03)
01:06.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b
Link Layer Controller (rev 01)
02:04.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b
Link Layer Controller (rev 01)
02:05.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b
Link Layer Controller (rev 01)
02:06.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b
Link Layer Controller (rev 01)
02:07.0 FireWire (IEEE 1394): Texas Instruments TSB82AA2 IEEE-1394b
Link Layer Controller (rev 01)

I will also try cramming a machine full of Fireboards and seeing if I
can't get one of them to fail.

--
Robert Crocombe
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/