[PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Michael Buesch
This masks the PHY TX error interrupt, if debugging is disabled.

Currently we have a bug somewhere which triggers this interrupt once
in a while. (Depends on the network noise/quality). While this is nonfatal,
it scares the hell out of users and we frequently receive bugreports
that incorrectly identify this error message as the reason.

There's another problem with this. The PHY TX error interrupt is protected
with a watchdog that will restart the device if it keeps triggering very often.
This is used to fix interrupt storms from completely broken devices.

However, this watchdog might trigger in completely normal operation.
If the TX capacity of the card is saturated, the likeliness of the watchdog
triggering increases, as more TX errors occur. The current threshold
for the watchdog is 1000 errors in 15 seconds.

This patch adds a workaround for the issue by just enabling the interrupt
if debugging is disabled (by Kconfig or by modparam).

This has the downside that real fatal PHY TX errors are not caught anymore.
But this is nonfatal due to the following reasons:
* If the card is not able to transmit anymore, MLME will notice anyway.
* I did _never_ see a real fatal PHY TX error in a mainline b43 driver.
* It does _not_ result in interrupt storms or something like that.
  It will simply result in a stalled card. It can be debugged by enabling
  the debugging module parameter.

Signed-off-by: Michael Buesch m...@bu3sch

---

I wonder how much placebo PHY TX error was fixed and my card performs great 
again
we will get. :D

!!! DISTRIBUTIONS !!!
Disable CONFIG_B43_DEBUG!
There is absolutely _no_ reason to enable it on a release kernel.
There were valid reasons in the past, but there are none left anymore.
So please _disable_ this option now, if you didn't do this already,
because with CONFIG_B43_DEBUG enabled the PHY TX errors will still show.



John, please merge this for the next feature release.


Index: wireless-testing/drivers/net/wireless/b43/main.c
===
--- wireless-testing.orig/drivers/net/wireless/b43/main.c   2009-03-19 
17:27:39.0 +0100
+++ wireless-testing/drivers/net/wireless/b43/main.c2009-03-19 
18:53:16.0 +0100
@@ -3990,12 +3990,14 @@ static void setup_struct_wldev_for_init(
setup_struct_phy_for_init(dev, dev-phy);
 
/* IRQ related flags */
dev-irq_reason = 0;
memset(dev-dma_reason, 0, sizeof(dev-dma_reason));
dev-irq_savedstate = B43_IRQ_MASKTEMPLATE;
+   if (b43_modparam_verbose  B43_VERBOSITY_DEBUG)
+   dev-irq_savedstate = ~B43_IRQ_PHY_TXERR;
 
dev-mac_suspended = 1;
 
/* Noise calculation context */
memset(dev-noisecalc, 0, sizeof(dev-noisecalc));
 }

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Michael Buesch
On Thursday 19 March 2009 19:27:21 Michael Buesch wrote:
 This patch adds a workaround for the issue by just enabling the interrupt
 if debugging is disabled (by Kconfig or by modparam).

Of course I meant just disabling the interrupt

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Francesco Gringoli
On Mar 19, 2009, at 7:27 PM, Michael Buesch wrote:

 This masks the PHY TX error interrupt, if debugging is disabled.

 Currently we have a bug somewhere which triggers this interrupt once
 in a while. (Depends on the network noise/quality). While this is  
 nonfatal,
Michael,

some time ago I begin seeing several of these errors, never seen  
before on one of my host, with both proprietary and open firmwares. As  
I never noticed those errors before, I wondered if they could be due  
to some strange frame received by air, something like a frame encoded  
in CCK but with a broken field that caused the firmware to ack back a  
frame whose first byte (encoding) didn't match the following inside  
the plcp. That was obviously not the case, indeed those errors were  
not even happening on tx tries and surprisingly they were happening  
also on devices configured in monitor mode.

I finally remembered that the day before starting observing errors, I  
changed the minipci to pci adapter inside that host, maintaining the  
same cable and antenna set. Removing the broken adapter stopped PHY  
errors.

After this debug session I have some notes
- PHY error IRQs are not triggered by the firmware (both open and  
proprietary) by writing to the IRQ registers
- these strange PHY errors are not due to tx tries, they happen also  
with devices were the tx code has been cut away
- PHY errors are triggered by the hardware when the number of bytes  
requested for transmission do not match the tx information stored in  
the first four bytes of the plcp, this happens for both frames sent by  
b43 through dma and frames composed by the firmware. If everything is  
consistent I never see errors on platforms not affected by noise (as  
my old VIA or the broken minipci to pci adapter).

I would say this noise directly affects the irq line, or it triggers  
the serializer to send out a packet with completely wrong radio/plcp/ 
mac configuration that causes a PHY tx error.

Cheers,
-FG

 it scares the hell out of users and we frequently receive bugreports
 that incorrectly identify this error message as the reason.

 There's another problem with this. The PHY TX error interrupt is  
 protected
 with a watchdog that will restart the device if it keeps triggering  
 very often.
 This is used to fix interrupt storms from completely broken devices.

 However, this watchdog might trigger in completely normal operation.
 If the TX capacity of the card is saturated, the likeliness of the  
 watchdog
 triggering increases, as more TX errors occur. The current threshold
 for the watchdog is 1000 errors in 15 seconds.

 This patch adds a workaround for the issue by just enabling the  
 interrupt
 if debugging is disabled (by Kconfig or by modparam).

 This has the downside that real fatal PHY TX errors are not caught  
 anymore.
 But this is nonfatal due to the following reasons:
 * If the card is not able to transmit anymore, MLME will notice  
 anyway.
 * I did _never_ see a real fatal PHY TX error in a mainline b43  
 driver.
 * It does _not_ result in interrupt storms or something like that.
  It will simply result in a stalled card. It can be debugged by  
 enabling
  the debugging module parameter.

 Signed-off-by: Michael Buesch m...@bu3sch

 ---

 I wonder how much placebo PHY TX error was fixed and my card  
 performs great again
 we will get. :D

 !!! DISTRIBUTIONS !!!
 Disable CONFIG_B43_DEBUG!
 There is absolutely _no_ reason to enable it on a release kernel.
 There were valid reasons in the past, but there are none left anymore.
 So please _disable_ this option now, if you didn't do this already,
 because with CONFIG_B43_DEBUG enabled the PHY TX errors will still  
 show.



 John, please merge this for the next feature release.


 Index: wireless-testing/drivers/net/wireless/b43/main.c
 ===
 --- wireless-testing.orig/drivers/net/wireless/b43/main.c 2009-03-19  
 17:27:39.0 +0100
 +++ wireless-testing/drivers/net/wireless/b43/main.c  2009-03-19  
 18:53:16.0 +0100
 @@ -3990,12 +3990,14 @@ static void setup_struct_wldev_for_init(
   setup_struct_phy_for_init(dev, dev-phy);

   /* IRQ related flags */
   dev-irq_reason = 0;
   memset(dev-dma_reason, 0, sizeof(dev-dma_reason));
   dev-irq_savedstate = B43_IRQ_MASKTEMPLATE;
 + if (b43_modparam_verbose  B43_VERBOSITY_DEBUG)
 + dev-irq_savedstate = ~B43_IRQ_PHY_TXERR;

   dev-mac_suspended = 1;

   /* Noise calculation context */
   memset(dev-noisecalc, 0, sizeof(dev-noisecalc));
 }

 -- 
 Greetings, Michael.
 ___
 Bcm43xx-dev mailing list
 Bcm43xx-dev@lists.berlios.de
 https://lists.berlios.de/mailman/listinfo/bcm43xx-dev

---

Francesco Gringoli, PhD - Assistant Professor
Dept. of Electrical Engineering for Automation
University of Brescia
via Branze, 38
25123 Brescia
ITALY

Ph:  ++39.030.3715843
FAX: 

Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Michael Buesch
On Thursday 19 March 2009 20:00:45 Francesco Gringoli wrote:
 some time ago I begin seeing several of these errors, never seen  
 before on one of my host, with both proprietary and open firmwares. As  
 I never noticed those errors before, I wondered if they could be due  
 to some strange frame received by air, something like a frame encoded  
 in CCK but with a broken field that caused the firmware to ack back a  
 frame whose first byte (encoding) didn't match the following inside  
 the plcp. That was obviously not the case, indeed those errors were  
 not even happening on tx tries and surprisingly they were happening  
 also on devices configured in monitor mode.

Well, they _are_ triggered by things going on in the WM. But I think
they are a lot lower level than MAC or PLCP. I think it is related to
the raw modulation.

In the end, I'm pretty sure this is some misconfiguration of some very
low level PHY part. Too bad we don't know about a debugging register
to get more information on _what_ part does trigger the error.

 I finally remembered that the day before starting observing errors, I  
 changed the minipci to pci adapter inside that host, maintaining the  
 same cable and antenna set. Removing the broken adapter stopped PHY  
 errors.

Yeah well. This confirms my thoughts.
There are other ways to voluntarily trigger the errors. For example
try covering the antennae with your bare hands. Try to move the
device to a place with extremely bad signal (Iron beams between them).
Try to move the transceivers very close (20cm) together, so basic rf rules are 
violated.

This are all pretty reliable ways to trigger these errors.

 After this debug session I have some notes
 - PHY error IRQs are not triggered by the firmware (both open and  
 proprietary) by writing to the IRQ registers

Right. I don't think it's really related to what firmware is running.
It may be the case that some firmware might encourage the errors by some
special timing in code operations, but the firmware does not trigger them.

 - these strange PHY errors are not due to tx tries, they happen also  
 with devices were the tx code has been cut away

Well, I did not see that, so I cannot really comment on this.
I never saw them in monitor mode.

 - PHY errors are triggered by the hardware when the number of bytes  
 requested for transmission do not match the tx information stored in  
 the first four bytes of the plcp, this happens for both frames sent by  
 b43 through dma and frames composed by the firmware. If everything is  

This is correct and known behavior. But this is _not_ what is happening here.

 consistent I never see errors on platforms not affected by noise (as  
 my old VIA or the broken minipci to pci adapter).

Yes, less noise = less errors. That's clearly the case.

 I would say this noise directly affects the irq line, or it triggers  
 the serializer to send out a packet with completely wrong radio/plcp/ 
 mac configuration that causes a PHY tx error.

I don't think it triggers the IRQ line. I'd rather think that some sensitivity
threshold is configured incorrectly, so the PHY will trigger the errors on
completely valid stuff.

So now this is your turn: Which one? :D

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Francesco Gringoli

On Mar 19, 2009, at 8:13 PM, Michael Buesch wrote:

 On Thursday 19 March 2009 20:00:45 Francesco Gringoli wrote:


 Yeah well. This confirms my thoughts.
 There are other ways to voluntarily trigger the errors. For example
 try covering the antennae with your bare hands. Try to move the
 device to a place with extremely bad signal (Iron beams between them).
 Try to move the transceivers very close (20cm) together, so basic rf  
 rules are violated.

 This are all pretty reliable ways to trigger these errors.
Cool! I will give it a try.

 - these strange PHY errors are not due to tx tries, they happen also
 with devices were the tx code has been cut away

 Well, I did not see that, so I cannot really comment on this.
 I never saw them in monitor mode.
It was the reason that made me lose a lot of time in putting traps  
into the firmware to understand if we were forgetting something in  
configuring devices to run in monitor mode. Well, we are not: the tx  
code is never crossed. But PHY errors are triggered the same.

 I would say this noise directly affects the irq line, or it triggers
 the serializer to send out a packet with completely wrong radio/plcp/
 mac configuration that causes a PHY tx error.

 I don't think it triggers the IRQ line. I'd rather think that some  
 sensitivity
 threshold is configured incorrectly, so the PHY will trigger the  
 errors on
 completely valid stuff.

I would agree with you, but there is this bizarre issue with PHY  
errors in monitoring mode that makes me thinking about what we call  
PHY errors. I would say they are not only due to transmission, they  
are general PHY errors, could they be? One last test I could try, is  
to put again the broken minipci to pci adapter in one pci slot and put  
on the next slot the adapter that does not trigger these errors. If  
the interference caused by the broken adapter induces the wifi boards  
on top of it in errors, it should induce the same error on the board  
mounted on the right adapter.

Cheers,
-FG



 So now this is your turn: Which one? :D

 -- 
 Greetings, Michael.

---

Francesco Gringoli, PhD - Assistant Professor
Dept. of Electrical Engineering for Automation
University of Brescia
via Branze, 38
25123 Brescia
ITALY

Ph:  ++39.030.3715843
FAX: ++39.030.380014
WWW: http://www.ing.unibs.it/~gringoli




___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Michael Buesch
On Thursday 19 March 2009 20:56:52 Francesco Gringoli wrote:
 I would agree with you, but there is this bizarre issue with PHY  
 errors in monitoring mode that makes me thinking about what we call  
 PHY errors. I would say they are not only due to transmission, they  
 are general PHY errors, could they be? One last test I could try, is  

No the interrupt indicates a PHY TX error.
This name is from the broadcom headers, so we can trust that it's correct.

As I said, I never saw the error with the proprietary firmware in monitor mode.
If you know a way how to trigger them, please tell me.

 to put again the broken minipci to pci adapter in one pci slot and put  
 on the next slot the adapter that does not trigger these errors. If  
 the interference caused by the broken adapter induces the wifi boards  
 on top of it in errors, it should induce the same error on the board  
 mounted on the right adapter.

Well, the question is what can we learn from this test? ;)
What we really need is a way to find out which part of the PHY triggers
this error. We have a dozen of methods to trigger the error. But we _still_
do not know the lowlevel PHY conditions that result in an error.

Probably somebody with lots of time should randomly go through the code
and set various PHY thresholds to big/small values. Maybe that'll point us into
some direction. The problem is that the code basically is undocumented and we
only write blob values to the registers. So from reading the code you won't even
know where these values are written. But the specs give some useful hints. ;)

-- 
Greetings, Michael.
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Francesco Gringoli

On Mar 19, 2009, at 9:10 PM, Michael Buesch wrote:

 On Thursday 19 March 2009 20:56:52 Francesco Gringoli wrote:
 I would agree with you, but there is this bizarre issue with PHY
 errors in monitoring mode that makes me thinking about what we call
 PHY errors. I would say they are not only due to transmission, they
 are general PHY errors, could they be? One last test I could try, is

 No the interrupt indicates a PHY TX error.
 This name is from the broadcom headers, so we can trust that it's  
 correct.

 As I said, I never saw the error with the proprietary firmware in  
 monitor mode.
 If you know a way how to trigger them, please tell me.
It should be pretty easy, if you can observe these errors in sta mode  
(for instance with the cool method you told me before), you should see  
the same errors also in monitor mode, that is what was happening with  
my two adapters mounted on the same pci raiser (one with four minipci  
slots, unfortunately broken as when I use it I see PHY errors).

Cheers,
-FG
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev


Re: [PATCH] b43: Mask PHY TX error interrupt, if not debugging

2009-03-19 Thread Larry Finger
Michael Buesch wrote:
 
 The problem is that the code basically is undocumented and we
 only write blob values to the registers. So from reading the code you won't 
 even
 know where these values are written. But the specs give some useful hints. ;)

I might be able to help with the last part. Not everything that I know has been
put into the specs. I welcome questions.

Larry
___
Bcm43xx-dev mailing list
Bcm43xx-dev@lists.berlios.de
https://lists.berlios.de/mailman/listinfo/bcm43xx-dev