Re: iwm frequent 'device timeout' error

2024-05-30 Thread Stefan Sperling
On Thu, May 30, 2024 at 09:55:00AM +0200, a...@alexis-fouilhe.fr wrote:
> >Synopsis:iwm frequent 'device timeout' error
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5 (GENERIC.MP) #55: Mon Mar  4 21:59:07 MST 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Several times a day, wireless networking stops working for a couple of 
> minutes.
>   Requests time out, browser reports that it can't reach any site, etc.
>   After a short time, 'iwm0: device timeout' is added to dmesg and after 
> yet
>   another short time, wireless networking starts working again.
>   iwm0 man page says is should not happen. 
>   This has happened to me for as long as I can remember.
> 
>   Below is what the driver says after 'ifconfig iwm0 debug'.
>   I trimmed a number of copies of the five lines beginning with
>   'iwm0: begin background scan', both before the timeout and after 
> recovery.

Does it work more reliably if you replace the firmware file as follows?
Not sure if this will help or even work at all, but it might:

 mv /etc/firmware/iwm-7265D-29  /etc/firmware/iwm-7265D-29.orig
 cp /etc/firmware/iwm-7265-17 /etc/firmware/iwm-7265D-29
 ifconfig iwm0 down up  # force firmware reload

If this helps then I could change the driver to load 7265-17 firmware
file by default.

The reason I'm asking is that our driver has issues with 7265D firmware.
We are still using the 7265-17 image on 7265 devices because of this.
The 3165 you have is the same chip, with some capabilities missing.



Re: iwm regression in 7.5 amd64

2024-04-29 Thread Stefan Sperling
On Mon, Apr 29, 2024 at 10:26:15PM +0200, Stefan Sperling wrote:
> Then send me the pcap file off-list.
> 

Received, off-list, thanks!

The config looks fine.
Channel 48, HT secondary channel below at 44, 80 MHz VHT center channel 42.
This is a valid config that will pass the tests we added to avoid firmware
errors. So this cannot be the source of the problem.

Not sure what is going on, sorry.
If you could bisect the regression to a specific commit or at least a
reasonably narrow date-range within the 7.4->7.5 release cycle then
perhaps we'll find a clue.



Re: iwm regression in 7.5 amd64

2024-04-29 Thread Stefan Sperling
On Mon, Apr 29, 2024 at 08:05:31PM +0200, Mizsei Zoltán wrote:
> Yes, it is a general speed regression. Reboot didn1t help. Firmware is 
> up-to-date, and AFAIK nothing else changed. The bottleneck is there in cli 
> aswell, not just in Firefox.
> 
> -- ext

The only possibility I can think of is that the changes to avoid
firmware panics on APs that announce invalid 11ac channel
configurations are implicated.

To know whether this is the case I need to see the beacon channel
configuration info.

While associated, run this for a short while:

tcpdump -n -i iwm0 -y IEEE802_11_RADIO -s4096 -w /tmp/iwm0.pcap type mgt 
subtype beacon

Ctrl-C out of it and check whether tcpdump -r /tmp/iwm0.pcap contains
lines that begin with a timestamp and then "802.11: beacon" and show
your APs SSID.

Then send me the pcap file off-list.



Re: iwm regression in 7.5 amd64

2024-04-28 Thread Stefan Sperling
On Sun, Apr 28, 2024 at 09:45:42PM +0200, Mizsei Zoltán wrote:
> >Synopsis:iwm regression in 7.5 amd64
> >Category:amd64
> >Environment:
>   System  : OpenBSD 7.5
>   Details : OpenBSD 7.5 (GENERIC.MP) #82: Wed Mar 20 15:48:40 MDT 2024
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I've noticed radically reduced networking speed since the 7.5 update. 
> On 500Mbit net I experience 3-5 Mbit upload and 20 Mbit download speeds. 
> Nothing else changed, as far as i know. 
> >How-To-Repeat:
>   Do a speedtest on 7.5 amd64 GENERIC system using an iwm card and with 
> up-to-date firefox.
> >Fix:
>   No idea

Please test with tcpbench to a host on LAN behind your AP. Running
a speed test to the internet is pointless if the goal is to measure
wifi throughput.

With tcpbench I am seeing about 290Mbps Tx and Rx with iwm0.

Generic internet speed test fast.com shows 210Mbps down, 86Mbps up.
That's less than the wifi is capable of.

> iwm0 at pci4 dev 0 function 0 "Intel Dual Band Wireless-AC 8265" rev 0x78, msi

I was also testing with 8265.



Re: problem with qwx on thinkpad arm

2024-04-28 Thread Stefan Sperling
On Sun, Apr 28, 2024 at 01:46:22PM +0200, BESSOT Jean-Michel wrote:
> Hello
> 
> qwx does not work after the last snapshot.
> 
> There is an error message in dmesg that I send.
> 
> If you need more information feel free to ask.
> 
> bye

> qwx0: could not allocate DP SRNG DMA memory

Does this error persist or is it only happening occasionally?
Does it persist during repeated ifconfig qwx0 down/up cycles?
Does it persist after rebooting?

It looks like your machine ran out of sufficient non-fragmented
memory below 4GB, which was evidently still available while the
system was booting since the firmware version was displayed,
meaning the affected code path ran successfully at least once.

The qwx driver is doing a lot of DMA allocations during 'ifconfig qwx0 up'
because the code inherited from Linux behaves this way and I didn't have
the guts to change that during the initial porting effort as it would have
made things even more complicated during this early phase of development.

Generally, OpenBSD drivers reserve DMA memory during early boot (autoconf)
and keep it around to avoid such errors in case the system runs out of
sufficient usable fresh DMA memory later.
So this issue could probably be avoided by refactoring things such that
DMA memory is only allocated once and not freed when the interface goes
down.

The rest of the error messages are most likely just fallout from the
above allocation error.

> qwx_dp_pdev_reo_setup: failed to setup reo_dst_ring
> qwx_core_start: failed to initialize reo destination rings: 12
> qwx0: failed to start core: 12
> qwx0: tx credits timeout
> qwx0: failed to send WMI_PDEV_SET_PARAM cmd
> qwx0: failed to enable dynamic bw for pdev 0: 35



Re: iwx obtains IP via DHCP but no traffic

2024-04-22 Thread Stefan Sperling
On Sun, Apr 21, 2024 at 06:31:39PM +0200, Kirill A. Korinsky wrote:
> Greeting,
> 
> finally I've catch something. I not sure is it the same error or not, but it
> seems right to share anyway.
> 
> It was get from snapshot which was installed about 3 days ago.
> 
>   Apr 21 10:59:40 matebook /bsd: iwx0: Start UMAC Error Log Dump:
>   Apr 21 10:59:40 matebook /bsd: iwx0: Status: 0x39, count: 7
>   Apr 21 10:59:40 matebook /bsd: iwx0: 0x20103522 | ADVANCED_SYSASSERT

I've never seen this before. Accoding to linux developers this seems to
be an error related to a firmware-internal problem with power-saving.
Yet our driver disables power-saving, which raises even more questions.

I doubt there is anything we could in the driver to prevent this.
There are newer firmware images in the iwlwifi linux-firmare repository
would could upgrade to. This might help, or it might not.



Re: deassoc attack, or, trapped in S_RUN forever?

2024-04-22 Thread Stefan Sperling
On Mon, Apr 22, 2024 at 10:46:50AM +0200, Peter J. Philipp wrote:
> Hi,
> 
> I have a shortage of RJ45 adapters so I had to expose one host on wifi, for
> this I set up an access point of type "AVM Fritz!Box 7390" with latest 
> firmware
> from last year.  
> 
> Soon I started facing deauth attacks from someone within my wifi cell region.
> Since the deauthers were spoofing my OpenBSD host to the AVM it wasn't 
> resistant to this.  So what I did was crontab (with ~) setting up a random 
> lladdr on my OpenBSD boxes bwfm0 and resetting the interface.  Worked well 
> for a day, then the bad guy changed his attack, and made my OpenBSD SBC 
> inoperable.
> 
> I have traced some of the things he/she did and I'll try to reconstruct it.
> 
> 1. I set up a wifi connection on 2.4 GHz which seemingly made the attacker
> annoyed.  It could have been the SSID of "SUGARHILL" that really annoyed them.
> 
> 2. Attacker sets up deauther on AP cuasing us to crontab random MAC's, this
> worked for a while.  The attack was temporarily stopped.
> 
> 3. Attacker sets up an IBSS node with the same SSID of "SUGARHILL".  Now here
> is where it gets weird.  I don't know if my SBC had a bssid with mac address
> for association set.  Let's pretend this is a grey area.  Attacker then
> proceeds to disassociate OpenBSD.
> 
> 4. I reassociate because we are in the process of changing MAC address per
> crontab.
> 
> 5. For whatever reason we join the IBSS instead of the AP.  Please see A for
> the code analysis I did.
> 
> 6. The OpenBSD is inoperable.  Since I don't have UART console on it I can't
> reach it anymore.  A powercycle may bring it back.  Eventually I'll have to
> set up a UART for it.  
> 
> Code Analysis:
> 
> This is the first real search in the IEEE 802.11 on OpenBSD code for me.  Does
> a state diagram exist for this?
> 
> A. ieee80211_node.c - in _node_join_bss()
> 
> IBSS has higher preference on line 1333.  new_state() beomes S_RUN.  Important
> is that mgt is set to -1.
> 
> B. I can't understand currently where else it would go but in state S_RUN
>   it can parse a lot of frames and drops a lot of frames as well.  I
>   personally saw that at this point my SBC was inoperable to the wifi.
> 
> 
> 
> Conclusions?  I don't have any.  I'm hoping to start some dialogue among the
> powers that be.  It would be nice to flag off joining IBSS's when only wanting
> to join BSS's.  But while I'm unsure it's best not to send patches.
> 
> Here is the dmesg of the SBC in question:
> 
> https://blog.centroid.eu/c?article=1712648412
> 
> PS: A USB-C RJ45 dongle costs 15 EUR.  I may just include it in a batch when
> I buy from my computer and electronics supplier next.
> 
> Best regards,
> -pjp

The bwfm(4) driver is not set up to use IBSS. If this driver joins an
IBSS network then that is a bug. It shouldn't be doing that.

Regardless, the best way to avoid deauth attacks is to use management
frame protection aka. MFP aka IEEE 802.11w-2009.
The Fritzbox probably supports this, but our net80211 stack doesn't quite
support this yet. There is some code which is disabled, none of the drivers
make use of it. I myself don't have enough spare cycles to work on it.

Another solution is to use an OpenBSD AP *and* client and set the stayauth
nwflag on both of them, e.g. ifconfig athn0 nwflag stayauth
This currently only helps against deauth frames, see ieee80211_recv_deauth().
It won't help against disassoc frames. But it should be easy to copy the
stayauth code to other input frame handlers if needed.



Re: dwqe ifconfig down panic

2024-03-28 Thread Stefan Sperling
On Wed, Mar 27, 2024 at 02:08:27PM +0100, Stefan Sperling wrote:
> On Tue, Mar 26, 2024 at 11:05:49PM +0100, Patrick Wildt wrote:
> > On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > > Hi,
> > > 
> > > When doing flood ping transmit from a machine and simultaneously
> > > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
>  
> > * Don't run TX/RX proc in case the interface is down?
> 
> The RX path already has a corresponding check. But the Tx path does not.
> 
> If the problem is a race involving mbufs freed via dwqe_down() and
> mbufs freed via dwqe_tx_proc() then this simple tweak might help.

With this patch bluhm's test machine has survived 30 minutes of
flood ping + ifconfig down/up in a loop. Without the patch the
machine crashes within a few seconds.

I understand that there could be an issue in intr_barrier() which
gets papered over by this patch. However the patch does avoid the
crash and it is trivial to revert when testing the effectiveness
of any potential intr_barrier() fixes.

ok?

> diff /usr/src
> commit - 029d0a842cd8a317375b31145383409491d345e7
> path + /usr/src
> blob - 97f874d2edf74a009a811455fbf37ca56f725eef
> file + sys/dev/ic/dwqe.c
> --- sys/dev/ic/dwqe.c
> +++ sys/dev/ic/dwqe.c
> @@ -593,6 +593,9 @@ dwqe_tx_proc(struct dwqe_softc *sc)
>   struct dwqe_buf *txb;
>   int idx, txfree;
>  
> + if ((ifp->if_flags & IFF_RUNNING) == 0)
> + return;
> +
>   bus_dmamap_sync(sc->sc_dmat, DWQE_DMA_MAP(sc->sc_txring), 0,
>   DWQE_DMA_LEN(sc->sc_txring),
>   BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> > 
> 
> 



Re: dwqe ifconfig down panic

2024-03-27 Thread Stefan Sperling
On Tue, Mar 26, 2024 at 11:05:49PM +0100, Patrick Wildt wrote:
> On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> > Hi,
> > 
> > When doing flood ping transmit from a machine and simultaneously
> > ifconfig down/up in a loop, dwqe(4) interface driver crashes.
 
> * Don't run TX/RX proc in case the interface is down?

The RX path already has a corresponding check. But the Tx path does not.

If the problem is a race involving mbufs freed via dwqe_down() and
mbufs freed via dwqe_tx_proc() then this simple tweak might help.

diff /usr/src
commit - 029d0a842cd8a317375b31145383409491d345e7
path + /usr/src
blob - 97f874d2edf74a009a811455fbf37ca56f725eef
file + sys/dev/ic/dwqe.c
--- sys/dev/ic/dwqe.c
+++ sys/dev/ic/dwqe.c
@@ -593,6 +593,9 @@ dwqe_tx_proc(struct dwqe_softc *sc)
struct dwqe_buf *txb;
int idx, txfree;
 
+   if ((ifp->if_flags & IFF_RUNNING) == 0)
+   return;
+
bus_dmamap_sync(sc->sc_dmat, DWQE_DMA_MAP(sc->sc_txring), 0,
DWQE_DMA_LEN(sc->sc_txring),
BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
> 



Re: urndis driver broken on Thinkpad T21 (i386)

2024-03-22 Thread Stefan Sperling
On Fri, Mar 22, 2024 at 01:30:26PM +0100, be...@stuerz.xyz wrote:
> >Synopsis:urndis driver doesn't work on Thinkpad T21 (i386)
> >Category:i386
> >Environment:
>   System  : OpenBSD 7.4
>   Details : OpenBSD 7.4 (GENERIC) #2: Fri Dec  8 15:41:12 MST 2023
>
> r...@syspatch-74-i386.openbsd.org:/usr/src/sys/arch/i386/compile/GENERIC
> 
>   Architecture: OpenBSD.i386
>   Machine : i386
> >Description:
>   I can't connect my Thinkpad T21 through USB tethering to the internet.
>   It does work on my Thinkpad T480 (amd64) without any issues.
>   Unfortunately, I don't have any other i386 machines at the moment,
>   to see if this is a bug on i386 or just on my machine.
> >How-To-Repeat:
>   1. Attach a smartphone to the USB 1.1 port of a Thinkpad T21.

I don't recall networking over USB 1 being a huge thing ever.
Are you sure this phone will do RNDIS over USB 1?
Maybe USB >= 2 is required?

The T21 has a cardbus slot, which would take a cardbus USB-2 host
adapter card to plug the phone into. If your phone can run a wifi
hotspot then plugging in a supported cardbus wifi card would work, too.
See the cardbus(4) manual page for a list of relevant drivers.

>   2. Enable USB Tethering on the smartphone.
>   3. Run "dmesg" on the Thinkpad.
>   4. See the following output:
>   ugen0 at uhub1 port 2 "Google Pixel 6" rev 2.10/5.10 addr 3
>   ugen0 detached
>   urndis0 at uhub1 port 2 configuration 1 interface 0 "Google Pixel 6" 
> rev 2.10/5.10 addr 3
>   urndis0: using Vendorurndis0: TIMEOUT
>   urndis0: unable to get query response
> >Fix:
>   Not (yet) known.



Re: iwx obtains IP via DHCP but no traffic

2024-03-13 Thread Stefan Sperling
On Wed, Mar 13, 2024 at 12:05:59AM +0100, Kirill A. Korinsky wrote:
> On Tue, 27 Feb 2024 08:45:44 +0100,
> Stefan Sperling wrote:
> > 
> > > Anyway, I'll keep debug enabled for the case if frimeware error happens.
> > 
> > Yes, that would still be interesting, thanks.
> 
> Here it is:
> 
>   iwx0: unhandled firmware response 0x3ff/0x2008 rx ring 95[17]
>   iwx0: fatal firmware error

It seems you were not running in 'ifconfig iwx0 debug' mode, and
details about the firmware error are missing as a result.

Full fatal firmware error output with debugging should look something
like this:

iwx0: fatal firmware error
iwx0: Start Error Log Dump:
iwx0: Status: 0x39, count: 6
iwx0: 0x22CE | ADVANCED_SYSASSERT
iwx0: 82F0 | trm_hw_status0
iwx0:  | trm_hw_status1
iwx0: 004DC582 | branchlink2
iwx0: 004D22D6 | interruptlink1
iwx0: 004D22D6 | interruptlink2
iwx0: 00C4 | data1
iwx0:  | data2
iwx0: 0400 | data3
iwx0: 164059EC | beacon time
iwx0: DEC9F621 | tsf low
iwx0: 047F | tsf hi
iwx0:  | time gp1
iwx0: 0705E80F | time gp2
iwx0: 0001 | uCode revision type
iwx0: 004D | uCode version major
iwx0: F92B5FED | uCode version minor
iwx0: 0420 | hw version
iwx0: 18880002 | board version
iwx0: 806AFC18 | hcmd
iwx0: E7F8 | isr0
iwx0: 0040 | isr1
iwx0: 50F80002 | isr2
iwx0: 04D3409E | isr3
iwx0: 0020 | isr4
iwx0: 02F9001C | last cmd Id
iwx0: 000107A0 | wait_event
iwx0: 0080 | l2p_control
iwx0: 0020 | l2p_duration
iwx0: 003F | l2p_mhvalid
iwx0: 00CE18B8 | l2p_addr_match
iwx0: 0009 | lmpm_pmg_sel
iwx0:  | timestamp
iwx0: 0A409090 | flow_handler
iwx0: Start UMAC Error Log Dump:
iwx0: Status: 0x39, count: 7
iwx0: 0x2070 | NMI_INTERRUPT_LMAC_FATAL
iwx0: 0x | umac branchlink1
iwx0: 0x8046D64C | umac branchlink2
iwx0: 0x8048D8D6 | umac interruptlink1
iwx0: 0x8048D8D6 | umac interruptlink2
iwx0: 0x0002 | umac data1
iwx0: 0x8048D8D6 | umac data2
iwx0: 0x | umac data3
iwx0: 0x004D | umac major
iwx0: 0xF92B5FED | umac minor
iwx0: 0x0705EBE7 | frame pointer
iwx0: 0xC0886258 | stack pointer
iwx0: 0x002A0516 | last host cmd
iwx0: 0x | isr status reg
driver status:
  tx ring  0: qid=0  cur=43  cur_hw=43  queued=0
  tx ring  1: qid=1  cur=7   cur_hw=7   queued=0
  tx ring  2: qid=2  cur=30  cur_hw=16414 queued=46
  tx ring  3: qid=3  cur=0   cur_hw=0   queued=0
  tx ring  4: qid=4  cur=0   cur_hw=0   queued=0
  tx ring  5: qid=5  cur=0   cur_hw=0   queued=0
  tx ring  6: qid=6  cur=0   cur_hw=0   queued=0
  tx ring  7: qid=7  cur=0   cur_hw=0   queued=0
  tx ring  8: qid=8  cur=0   cur_hw=0   queued=0
  tx ring  9: qid=9  cur=0   cur_hw=0   queued=0
  rx ring: cur=404
  802.11 state RUN



Re: iwx obtains IP via DHCP but no traffic

2024-02-26 Thread Stefan Sperling
On Mon, Feb 26, 2024 at 10:17:20PM +0100, Kirill A. Korinsky wrote:
> On Mon, 26 Feb 2024 12:39:12 +0100,
> Stefan Sperling wrote:
> > 
> > On Mon, Feb 26, 2024 at 11:40:35AM +0100, Kirill A. Korinsky wrote:
> > > all of this seems like a strong indicator that the issue is inside iwx
> > 
> > I have no idea what causes the latency spikes you are seeing, and
> > none of the info you've provided so far helps finding an answer.
> > 
> 
> Meanwhile, I was adble to reproduce latency issue. It occures only when CPU
> is loaded by something, and machine is working in low performance mode
> (400 Mhz).

Ah ok. Then the advice would be "don't do that", or run it in 11a mode
instead of 11ac ;)
 
> Thus, I made one more attempt with your patch during my tests, and I can't
> reproduce the issue with not traffic with enabled rate control anymore.
> 
> Perhabs I was made something wrong last time when I've tried it.

That is good news, thank you. This patch has already been committed
since it didn't cause problems in my own testing and definitely looked
like a bug fix.

> I applogize to misslead you.

No worries.

> Anyway, I'll keep debug enabled for the case if frimeware error happens.

Yes, that would still be interesting, thanks.



Re: iwx obtains IP via DHCP but no traffic

2024-02-26 Thread Stefan Sperling
On Mon, Feb 26, 2024 at 11:40:35AM +0100, Kirill A. Korinsky wrote:
> all of this seems like a strong indicator that the issue is inside iwx

You don't need to try hard to convince me that there could be
a bug in iwx. Yes, there will always be bugs in iwx. Many have
been fixed already but there will always be more.

I have no idea what causes the latency spikes you are seeing, and
none of the info you've provided so far helps finding an answer.

Some things you could try to isolate this further:

- Test the linux driver with the same firmware image, to rule out
  firmware-level issues.

- Turn off advanced features in 11ac, 11n by running in 11/a/b/g mode.
  This still wouldn't tell us much but at least point in a certain
  direction.

> Meanwhile, I have't got anythin inside dmesg.

Then let's wait for this firmware error to occur again.
Maybe we will learn something useful.



Re: iwx obtains IP via DHCP but no traffic

2024-02-25 Thread Stefan Sperling
On Sun, Feb 25, 2024 at 10:17:37PM +0100, Kirill A. Korinsky wrote:
> On Sun, 25 Feb 2024 17:32:46 +0100,
> Stefan Sperling wrote:
> > 
> > I have less than 2ms ping latency to my 5 GHz AP over iwx.
> > So this is probably not an issue in iwx but something else.
> 
> I wan't near machine for couple of hours, but it stay up and connected to
> the wifinetwork.
> 
> When I'll back nothing had changed, but on dmesg I've discovered:
> 
>   iwx0: hw rev 0x350, fw 77.2df8986f.0, address 98:8d:46:21:2b:6d
>   iwx0: fatal firmware error

To get more information about this error please run with 'ifconfig iwx0 debug'.
Just adding the line 'debug' in /etc/hostname.iwx0 is enough to activate
at boot. It will be a bit more noise in dmesg and once this firmware error
happens again there will be more details about it that I could then try to
get help with from developers at Intel.



Re: iwx obtains IP via DHCP but no traffic

2024-02-25 Thread Stefan Sperling
On Sun, Feb 25, 2024 at 05:19:33PM +0100, Kirill A. Korinsky wrote:
> Well,
> 
> I just rebooted to a new snapshot, and WiFi has been future degradated.
> 
> I do run a MPD sync in background which downloads some files via HTTP,
> but I may run before, and it never creates an issue.
> 
> Meanwhile the ping to GW from latop seems like:
> 
>   ~ $ ping 172.31.2.1 
>   PING 172.31.2.1 (172.31.2.1): 56 data bytes
>   64 bytes from 172.31.2.1: icmp_seq=0 ttl=64 time=36.148 ms
>   64 bytes from 172.31.2.1: icmp_seq=1 ttl=64 time=50.701 ms
>   64 bytes from 172.31.2.1: icmp_seq=2 ttl=64 time=65.322 ms
>   64 bytes from 172.31.2.1: icmp_seq=3 ttl=64 time=46.276 ms
>   64 bytes from 172.31.2.1: icmp_seq=4 ttl=64 time=58.030 ms
>   64 bytes from 172.31.2.1: icmp_seq=5 ttl=64 time=34.199 ms
>   64 bytes from 172.31.2.1: icmp_seq=6 ttl=64 time=47.096 ms
>   64 bytes from 172.31.2.1: icmp_seq=7 ttl=64 time=40.565 ms
>   ping: sendmsg: No buffer space available
>   ping: wrote 172.31.2.1 64 chars, ret=-1
>   64 bytes from 172.31.2.1: icmp_seq=9 ttl=64 time=44.815 ms
>   64 bytes from 172.31.2.1: icmp_seq=10 ttl=64 time=33.036 ms
>   64 bytes from 172.31.2.1: icmp_seq=11 ttl=64 time=63.137 ms
>   64 bytes from 172.31.2.1: icmp_seq=12 ttl=64 time=41.274 ms
>   64 bytes from 172.31.2.1: icmp_seq=13 ttl=64 time=59.496 ms
>   64 bytes from 172.31.2.1: icmp_seq=14 ttl=64 time=48.939 ms
>   ping: sendmsg: No buffer space available
>   ping: wrote 172.31.2.1 64 chars, ret=-1
>   64 bytes from 172.31.2.1: icmp_seq=16 ttl=64 time=44.050 ms
>   64 bytes from 172.31.2.1: icmp_seq=17 ttl=64 time=40.177 ms
>   ping: sendmsg: No buffer space available
>   ping: wrote 172.31.2.1 64 chars, ret=-1
>   64 bytes from 172.31.2.1: icmp_seq=19 ttl=64 time=62.400 ms
>   64 bytes from 172.31.2.1: icmp_seq=20 ttl=64 time=40.147 ms
>   ^C
>   --- 172.31.2.1 ping statistics ---
>   21 packets transmitted, 18 packets received, 14.3% packet loss
>   round-trip min/avg/max/std-dev = 33.036/47.545/65.322/9.949 ms
>   ~ $
> 
> Between AP and GW is 1Gb network, and my laptop in 50cm from AP.
> 
> I've tried to reboot AP, it doesn't help but move from 5Ghz to 2.4Ghz helps,
> and on 5Ghz it is:
> 
>   ~ $ ping 172.31.2.1 
>   PING 172.31.2.1 (172.31.2.1): 56 data bytes
>   64 bytes from 172.31.2.1: icmp_seq=0 ttl=64 time=518.499 ms
>   64 bytes from 172.31.2.1: icmp_seq=1 ttl=64 time=459.383 ms
>   64 bytes from 172.31.2.1: icmp_seq=2 ttl=64 time=787.812 ms
>   64 bytes from 172.31.2.1: icmp_seq=3 ttl=64 time=471.442 ms
>   64 bytes from 172.31.2.1: icmp_seq=4 ttl=64 time=493.617 ms
>   64 bytes from 172.31.2.1: icmp_seq=5 ttl=64 time=856.977 ms
>   64 bytes from 172.31.2.1: icmp_seq=6 ttl=64 time=735.358 ms
>   64 bytes from 172.31.2.1: icmp_seq=7 ttl=64 time=1012.529 ms
>   64 bytes from 172.31.2.1: icmp_seq=8 ttl=64 time=1591.216 ms
>   64 bytes from 172.31.2.1: icmp_seq=9 ttl=64 time=591.264 ms
>   64 bytes from 172.31.2.1: icmp_seq=10 ttl=64 time=374.638 ms
>   64 bytes from 172.31.2.1: icmp_seq=11 ttl=64 time=561.372 ms
>   64 bytes from 172.31.2.1: icmp_seq=12 ttl=64 time=344.401 ms
>   64 bytes from 172.31.2.1: icmp_seq=13 ttl=64 time=397.568 ms
>   64 bytes from 172.31.2.1: icmp_seq=14 ttl=64 time=375.690 ms
>   64 bytes from 172.31.2.1: icmp_seq=15 ttl=64 time=484.299 ms
>   64 bytes from 172.31.2.1: icmp_seq=16 ttl=64 time=239.776 ms
>   64 bytes from 172.31.2.1: icmp_seq=17 ttl=64 time=353.970 ms
>   64 bytes from 172.31.2.1: icmp_seq=18 ttl=64 time=183.788 ms
>   64 bytes from 172.31.2.1: icmp_seq=19 ttl=64 time=569.264 ms
>   64 bytes from 172.31.2.1: icmp_seq=20 ttl=64 time=449.187 ms
>   64 bytes from 172.31.2.1: icmp_seq=21 ttl=64 time=289.776 ms
>   64 bytes from 172.31.2.1: icmp_seq=22 ttl=64 time=412.872 ms
>   ping: sendmsg: No buffer space available
>   ping: wrote 172.31.2.1 64 chars, ret=-1
>   64 bytes from 172.31.2.1: icmp_seq=25 ttl=64 time=138.947 ms
>   64 bytes from 172.31.2.1: icmp_seq=26 ttl=64 time=359.616 ms
>   64 bytes from 172.31.2.1: icmp_seq=27 ttl=64 time=377.133 ms
> 
> So, latency is much worst.
> 
> No error, nothing in dmesg since reebot. I'm running on the kernel as it was
> installed on taday's snapshot.
> 
> And my home WiFi network is running on Automatic mode from Unifi, without
> any hacks.

I have less than 2ms ping latency to my 5 GHz AP over iwx.
So this is probably not an issue in iwx but something else.



Re: bwfm: no rx after changing lladdr

2024-02-19 Thread Stefan Sperling
On Sun, Feb 18, 2024 at 11:21:15AM -1000, Todd Carson wrote:
> 
> Thanks for the pointers; updated diff below to set the MAC in
> bwfm_init() if preinit has already run.
> 
> Works on boot with "lladdr random" as the first line in hostname.bwfm0,
> and also changing the lladdr after boot with ifconfig.
> 

Committed, thank you Todd.

> diff /usr/src
> commit - 6c24eb55e021991196003dc7f0a643e806b14295
> path + /usr/src
> blob - dfa7a1973d2ab6be7e4b2fbd869b38c441d4eae0
> file + sys/dev/ic/bwfm.c
> --- sys/dev/ic/bwfm.c
> +++ sys/dev/ic/bwfm.c
> @@ -451,6 +451,16 @@ bwfm_init(struct ifnet *ifp)
>   return;
>   }
>   sc->sc_initialized = 1;
> + } else {
> + /* Update MAC in case the upper layers changed it. */
> + IEEE80211_ADDR_COPY(ic->ic_myaddr,
> + ((struct arpcom *)ifp)->ac_enaddr);
> + if (bwfm_fwvar_var_set_data(sc, "cur_etheraddr",
> + ic->ic_myaddr, sizeof(ic->ic_myaddr))) {
> + printf("%s: could not write MAC address\n",
> + DEVNAME(sc));
> + return;
> + }
>   }
>  
>   /* Select default channel */
> 



Re: bwfm: no rx after changing lladdr

2024-02-18 Thread Stefan Sperling
On Sat, Feb 17, 2024 at 06:27:40PM -1000, Todd Carson wrote:
> 
> On a Raspberry Pi 4 running a recent snapshot, I found that the built-in
> bwfm interface would fail to receive non-broadcast traffic after
> changing the MAC address with ifconfig (for example by having
> "lladdr random" in hostname.bwfm0).
> 
> It looks like this was happening because the new MAC address was set in
> the kernel network stack but the bwfm driver wasn't doing anything
> to write the address to the device.
> 
> The below diff fixes it for me.
> I don't have any other bwfm devices to test.

An alternative approach is to memcpy the stack's MAC into ic_myaddr
whenever the interface comes up. E.g. iwx does this in iwx_preinit():

if (sc->attached) {
/* Update MAC in case the upper layers changed it. */
IEEE80211_ADDR_COPY(sc->sc_ic.ic_myaddr,
((struct arpcom *)ifp)->ac_enaddr);
return 0;
}

This way the new value of ic_myaddr is propagated to firmware as
part of the usual startup process.

A possible problem with your approach is that the bwfm_fwvar_set_data()
call might occur while the interface is still down and no firmware has
been loaded. It looks like you handle this case by returning EIO, which
means a hostname.bwfm0 file like the following would run into this I/O
error on the first line during boot:

  lladdr random
  nwid foo wpakey bar
  inet autoconf

With the approach taken by iwx(4) such files will work without errors
and the random MAC will be set in the stack only at first, while being
passed to the device when hardware comes up.

> diff /usr/src
> commit - 6c24eb55e021991196003dc7f0a643e806b14295
> path + /usr/src
> blob - dfa7a1973d2ab6be7e4b2fbd869b38c441d4eae0
> file + sys/dev/ic/bwfm.c
> --- sys/dev/ic/bwfm.c
> +++ sys/dev/ic/bwfm.c
> @@ -827,6 +827,17 @@ bwfm_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data
>   error = 0;
>   }
>   break;
> + case SIOCSIFLLADDR:
> + ifr = (struct ifreq *)data;
> + error = 0;
> + if (bwfm_fwvar_var_set_data(sc, "cur_etheraddr",
> + ifr->ifr_addr.sa_data, ETHER_ADDR_LEN)) {
> + error = EIO;
> + } else {
> + memcpy(ic->ic_myaddr, ifr->ifr_addr.sa_data,
> + sizeof(ic->ic_myaddr));
> + }
> + break;
>   case SIOCGIFMEDIA:
>   case SIOCG80211NODE:
>   case SIOCG80211ALLNODES:
> 
> 



Re: iwx obtains IP via DHCP but no traffic

2024-02-16 Thread Stefan Sperling
On Fri, Feb 16, 2024 at 11:25:51AM +, Stuart Henderson wrote:
> Also: regardless of whether it really makes sense for a given network,
> sometimes a network operator will do this anyway and as a user you have
> no control over it - and based on Kirill's description this seems a
> regression since 7.4?

If my proposed fix works then it's a regression from the -77 fw update.



Re: iwx obtains IP via DHCP but no traffic

2024-02-16 Thread Stefan Sperling
On Fri, Feb 16, 2024 at 11:45:30AM +0100, Kirill A. Korinsky wrote:
> Stefan,
> 
> On Fri, 16 Feb 2024 10:13:50 +0100,
> Stefan Sperling wrote:
> >
> > If your AP still announces 6M even while you've disabled this rate in 
> > settings,
> > then the AP is broken and there is nothing to fix for us, you could only try
> > asking the vendor for an AP firmware fix. Otherwise, there could be a bug in
> > net80211, iwx, or intel wifi firmware where the lack of support for 6 Mbps
> > on the AP breaks something.
> 
> When I disable 6mbps it announces as:
> 
>   11:39:06.596019 802.11 flags=0<>: beacon, 
> caps=10421, ssid (catap's 
> Network), rates 9M 12M* 18M 24M* 36M 48M 54M, ds (chan 40), tim 0x0104, 
> country 'DE ', channels 40-41 limit 23dB, power constraint 0dB, tpcreport 
> 0x, 195:3 0x011e1e, 70:5 0xf20001, 51:3 0x082830, 54:3 0x961000, 
> rsn=, 3 
> stations, 3% utilization, admission capacity 976us/s,  chan 40, 11n, sig 48dBm, noise -106dBm>
> 
> so, AP works as expected.

Thanks, good to know.

I might have found the root cause of your problem in iwx.
Can you try this diff please?

diff /usr/src
commit - 5f5902b3789b6f994566004963a31af6304d3a70
path + /usr/src
blob - 4b945edf2c73c6e2582819b283277baff81a6586
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -6085,13 +6085,12 @@ iwx_tx_fill_cmd(struct iwx_softc *sc, struct iwx_node 
} else if (sc->sc_rate_n_flags_version >= 2)
rate_flags |= IWX_RATE_MCS_LEGACY_OFDM_MSK;
 
-   rval = (rs->rs_rates[ni->ni_txrate] & IEEE80211_RATE_VAL);
if (sc->sc_rate_n_flags_version >= 2) {
if (rate_flags & IWX_RATE_MCS_LEGACY_OFDM_MSK) {
-   rate_flags |= (iwx_fw_rateidx_ofdm(rval) &
+   rate_flags |= (iwx_fw_rateidx_ofdm(rinfo->rate) &
IWX_RATE_LEGACY_RATE_MSK);
} else {
-   rate_flags |= (iwx_fw_rateidx_cck(rval) &
+   rate_flags |= (iwx_fw_rateidx_cck(rinfo->rate) &
IWX_RATE_LEGACY_RATE_MSK);
}
} else



Re: iwx obtains IP via DHCP but no traffic

2024-02-16 Thread Stefan Sperling
On Fri, Feb 16, 2024 at 03:22:07AM +0100, Kirill A. Korinsky wrote:
> Thus, disabling 5Ghz does help. that lead me to dig an issue on network 
> settings
> side. After poking around I've discovered that settings which triggers an 
> issue
> is "Minimum Data Rate Control" inside Unifi UI. It has settings for 5Ghz
> network: 6, 9, 12 and 24 mbps. Old one was 12 (or 24), anyway, an issue
> happening when this settings is 12 or 24 mbps.

Tne driver uses the lowest basic rate anncounced by the AP:

const struct iwx_rate *
iwx_tx_fill_cmd(struct iwx_softc *sc, struct iwx_node *in,
struct ieee80211_frame *wh, uint16_t *flags, uint32_t *rate_n_flags)
{
[...]
int min_ridx = iwx_rval2ridx(ieee80211_min_basic_rate(ic));


This rate will be used for broadcast frames, which include DHCP requests.
If the AP doesn't receive such frames then no DHCP requests will pass.

During scans, or while associated (even if DPCH doesn't work), you can check
the rates anncounted by the AP with tcpdump:

# tcpdump -i iwx0 -y IEEE802_11_RADIO -v type mgt subtype beacon

For example, my AP shows:

 rates 6M* 9M 12M* 18M 24M* 36M 48M 54M

An asterisk (*) indicates a basic rate, which the client must support in order
to join the network. And in any case, the 802.11 standard _requires_ 6, 12, and
24 to be supported, always! My AP's rate information is thus in fact redundant.
(In pratice, I've never seen devices that didn't support the whole rate set.)

If your AP still announces 6M even while you've disabled this rate in settings,
then the AP is broken and there is nothing to fix for us, you could only try
asking the vendor for an AP firmware fix. Otherwise, there could be a bug in
net80211, iwx, or intel wifi firmware where the lack of support for 6 Mbps
on the AP breaks something.

You won't see any improvements in throughput by disabling 6 Mbps.
The most noticable effect of disabling it will be compatibility issues.
Most of our drivers are still hard-coded to use 6 Mbps for broadcasts.
The only drivers which call ieee80211_min_basic_rate() are:
iwn, iwm, iwx, and qwx.

I wouldn't be surprised if this setting is also causing issues for phones,
laptops running an OS other than OpenBSD, etc.

I recommend leaving 6 Mbps enabled in your setup for production, and only
use this feature to help debug our drivers so we can make them work when
they come across an AP that's misconfigured in this way and cannot be fixed.

One reason to keep 6 Mbps disabled would be many APs on the same channel,
so many that their collective beacons sent at 6 Mbps use up all available
air time, leaving no time for actual data. But unless you're running something
like a CCC congress this limitation won't apply ;)



Re: Supported iwn device is not configured on ARM64

2024-01-16 Thread Stefan Sperling
On Tue, Jan 16, 2024 at 02:38:58PM +0800, Kevin Lo wrote:
> The below diff fixes clang warning about possible unaligned access.
> With that fixed, iwm(4) works as expected on my Rock 3A.
> 
> iwm0 at pci1 dev 0 function 0 "Intel Dual Band Wireless-AC 9260" rev 0x29, 
> msix
> iwm0: hw rev 0x320, fw ver 46.ff18e32a.0, address fc:77:74:xx:xx:xx

Thank you! ok stsp@

> Index: sys/dev/pci/if_iwmreg.h
> ===
> RCS file: /cvs/src/sys/dev/pci/if_iwmreg.h,v
> retrieving revision 1.68
> diff -u -p -u -p -r1.68 if_iwmreg.h
> --- sys/dev/pci/if_iwmreg.h   19 Mar 2022 10:26:52 -  1.68
> +++ sys/dev/pci/if_iwmreg.h   16 Jan 2024 06:06:00 -
> @@ -3361,7 +3361,7 @@ struct iwm_rx_mpdu_desc_v1 {
>   uint32_t phy_data0;
>   uint32_t phy_data1;
>   };
> - };
> + } __packed;
>  } __packed;
>  
>  #define IWM_RX_REORDER_DATA_INVALID_BAID 0x7f
> 



Re: iwx0 repeatedly goes offline with "unhandled firmware response 0x3fd"

2023-12-21 Thread Stefan Sperling
On Wed, Dec 20, 2023 at 02:00:36PM -0800, Greg Steuck wrote:
> I'm sending this with the subject of the bug reported before in hope it will 
> join
> the thread even though I'm seeing this with an iwm(4) interface. I'm
> running a Nov 3 snap of -current.
> 
> The messages seem mostly harmless as the interface recovers without prodding.
> 
> iwm0 at pci0 dev 20 function 3 "Intel AC 9560" rev 0x11, msix
> iwm0: hw rev 0x310, fw ver 46.ff18e32a.0, address 50:e0:85:
> 
> iwm0: flags=808843 mtu 1500
>   lladdr 50:e0:85:
>   index 1 priority 4 llprio 3
>   groups: wlan egress
>   media: IEEE802.11 autoselect (VHT-MCS0 mode 11ac)
>   status: active
>   ieee80211: join XXX chan 153 bssid 66:22:32: 97% wpakey wpaprotos wpa2 
> wpaakms psk wpaciphers ccmp wpagroupcipher ccmp
>   inet 192.168.0.101 netmask 0xff00 broadcast 192.168.0.255
> 
> % grep iwm /var/log/messages
> Dec 20 09:23:19 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 0[198]
> Dec 20 11:03:23 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 32[143]
> Dec 20 12:19:53 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 19[216]
> Dec 20 13:01:56 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 30[46]
> Dec 20 13:26:17 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 19[171]
> Dec 20 13:33:17 lenny /bsd: iwm0: unhandled firmware response 
> 0x3fd/0x380c rx ring 54[207]

I cannot tell what this firmware message is about because it is undocumented.

The response code decodes to MAC_CONF_GROUP (0x3) and sub-command 0xfd,
which is not listed in the firmware API header file:
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/tree/drivers/net/wireless/intel/iwlwifi/fw/api/mac-cfg.h#n76
0xfd is located in the range between PROBE_RESPONSE_DATA_NOTIF and
CHANNEL_SWITCH_START_NOTIF. So it is likely some other xxx_NOTIF event.

Without further information the best we could do is ignore the event.
But I will try to ask Linux developers for clues first.

> Dec 20 13:52:30 lenny /bsd: iwm0: fatal firmware error

Could you enable 'ifconfig iwx0 debug' and trigger this firmware error again
to get more details about the error?



Re: Wireless bug iwx -current

2023-12-13 Thread Stefan Sperling
On Wed, Dec 13, 2023 at 12:10:25PM +0100, Gabriel Brito wrote:
> Hello,
> 
> I just upgraded to current (7.4 GENERIC.MP#1485 amd64) and can no longer
> connect to a wireless networks using iwx - I tried with two different
> networks.
> 
> Sorry for the lack of further information - don't know what else would be
> usefull, and have no access to ethernet right now), and for the lack of
> formating (had to use my phone to send this).
> 
> Best,
> g
> 

Which version were you running before upgrading?

The iwx driver code in -current is the same as in 7.4 release.
There was just one trivial change to ignore an event which was seen
on one particular embedded hardware product. That change by itself
won't cause coonectivity issues.

If you want help, please run 'ifconfig iwx0 debug', try to connect,
and show the lines of iwx debug output added to /var/log/messages.



Re: netstart fails to parse complex wpa2 passphrases

2023-11-25 Thread Stefan Sperling
On Sat, Nov 25, 2023 at 08:20:47AM -0700, Theo de Raadt wrote:
> What you are asking for is too difficult to do.
> 
> netstart is a shell script.  shell script arguments are not 8 bit clean,
> because the the sh language has many meta & escape characters.
> 
> Your configuration exceeds what can be done.
> 

There is a workaround: ifconfig accepts pre-hashed WPA keys

Tools such as this will generate the required hash:
https://www.wireshark.org/tools/wpa-psk.html
(We used to have a wpa-psk tool in base but it's been removed.)

Put your SSID and passphrase in there, and the tool will generate a long
hex string: d7a38e9a542a82f19af0b2117687e43ba4cc60afeb742539ecd326fdee6b70b8

This hex string can be passed as wpakey by prefixing the string with "0x":

# ifconfig iwx0 join foo wpakey 
0xd7a38e9a542a82f19af0b2117687e43ba4cc60afeb742539ecd326fdee6b70b8
# ifconfig iwx0 joinlist | grep foo
  foo  wpaprotos wpa2 wpaakms psk wpaciphers ccmp 
wpagroupcipher ccmp

 



Re: AX211 wifi firmware load issue

2023-11-24 Thread Stefan Sperling
On Thu, Nov 23, 2023 at 08:26:32PM +, Stuart Henderson wrote:
> On 2023/11/23 19:38, Stefan Sperling wrote:
> > It is possible that our driver is trying to use an incompatible
> > firmware image on this particular device. Which firmware file name
> > is loaded by the iwlwifi driver on a recent Linux distribution?
> > Are we loading the same one?
> 
> Seems the first liveusb image that I picked up (Debian 12.2.0) isn't
> bleeding edge enough. Apparently it is actually an AX101.

Unfortunately, product names like AX101 don't mean anything in terms of
mapping a firmware image to a device. This is the collection of variables
used in the device lookup table:

struct iwx_dev_info {
uint16_t device;
uint16_t subdevice;
uint16_t mac_type;
uint16_t rf_type;
uint8_t mac_step;
uint8_t rf_id;
uint8_t no_160;
uint8_t cores;
uint8_t cdb;
uint8_t jacket;
const struct iwx_device_cfg *cfg;
};

Given all this info, a partial or complete match will eventually
result in a decision about which firmware image to load.
I suppose you can now see why it is easy to miscategorize a device
by accident. I don't know why Intel is doing it this way. Perhaps
they are running out of PCI product IDs.

Anyway, the best way to figure out which firmware is needed is to run
a version of the Linux driver that supports your device.
If your Linux distro is not bleeding-edge enough, try building the driver
from the backport-iwlwifi repository which is generally newer than mainline
Linux:
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/backport-iwlwifi.git/

If the device is very new it is possible that further driver-side changes
will be required to get it to work.



Re: AX211 wifi firmware load issue

2023-11-23 Thread Stefan Sperling
On Thu, Nov 23, 2023 at 12:46:24PM +, Stuart Henderson wrote:
> and here are kernel messages with IWX_DEBUG defined
> 
> iwx0: L1 Disabled - LTR Enabled
> iwx0: ucode type 0 section 0
> iwx0: ucode type 0 section 1
> iwx0: ucode type 0 section 2
> iwx0: ucode type 0 section 3
> iwx0: ucode type 0 section 4
> iwx0: ucode type 0 section 5
> iwx0: ucode type 0 section 6
> iwx0: ucode type 0 section 7
> iwx0: ucode type 0 section 8
> iwx0: ucode type 0 section 9
> iwx0: ucode type 0 section 10
> iwx0: ucode type 0 section 11
> iwx0: ucode type 0 section 12
> iwx0: ucode type 0 section 13
> iwx0: ucode type 0 section 14
> iwx0: ucode type 0 section 15
> iwx0: ucode type 0 section 16
> iwx0: ucode type 0 section 17
> iwx0: ucode type 0 section 18
> iwx0: ucode type 0 section 19
> iwx0: ucode type 0 section 20
> iwx0: ucode type 0 section 21
> iwx0: ucode type 0 section 22
> iwx0: ucode type 0 section 23
> iwx0: ucode type 0 section 24
> iwx0: ucode type 0 section 25
> iwx0: ucode type 0 section 26
> iwx0: ucode type 0 section 27
> iwx0: ucode type 0 section 28
> iwx0: ucode type 0 section 29
> iwx0: ucode type 0 section 30
> iwx0: ucode type 0 section 31
> iwx0: ucode type 0 section 32
> iwx0: ucode type 0 section 33
> iwx0: ucode type 0 section 34
> iwx0: ucode type 0 section 35
> iwx0: ucode type 0 section 36
> iwx0: ucode type 0 section 37
> iwx0: ucode type 0 section 38
> iwx0: ucode type 0 section 39
> iwx0: ucode type 0 section 40
> iwx0: ucode type 0 section 41
> iwx0: ucode type 0 section 42
> iwx0: ucode type 0 section 43
> iwx0: ucode type 0 section 44
> iwx0: ucode type 0 section 45
> iwx0: ucode type 0 section 46
> iwx0: ucode type 0 section 47
> iwx0: ucode type 0 section 48
> iwx0: ucode type 0 section 49
> iwx0: ucode type 0 section 50
> iwx0: ucode type 0 section 51
> iwx0: ucode type 0 section 52
> iwx0: ucode type 0 section 53
> iwx0: ucode type 0 section 54
> iwx0: ucode type 0 section 55
> iwx0: ucode type 0 section 56
> iwx0: L1 Disabled - LTR Enabled
> iwx_init_fw_sec: firmware LMAC section 0 at 0x40335000 size 1656
> iwx_init_fw_sec: firmware LMAC section 1 at 0x40777000 size 32768
> iwx_init_fw_sec: firmware LMAC section 2 at 0x4077f000 size 32768
> iwx_init_fw_sec: firmware LMAC section 3 at 0x40787000 size 32768
> iwx_init_fw_sec: firmware LMAC section 4 at 0x4078f000 size 32760
> iwx_init_fw_sec: firmware LMAC section 5 at 0x40797000 size 32768
> iwx_init_fw_sec: firmware LMAC section 6 at 0x4079f000 size 32768
> iwx_init_fw_sec: firmware LMAC section 7 at 0x407a7000 size 32768
> iwx_init_fw_sec: firmware LMAC section 8 at 0x407af000 size 32768
> iwx_init_fw_sec: firmware LMAC section 9 at 0x407b7000 size 32768
> iwx_init_fw_sec: firmware LMAC section 10 at 0x407bf000 size 32768
> iwx_init_fw_sec: firmware LMAC section 11 at 0x407c7000 size 32768
> iwx_init_fw_sec: firmware LMAC section 12 at 0x407cf000 size 32768
> iwx_init_fw_sec: firmware LMAC section 13 at 0x40334000 size 560
> iwx_init_fw_sec: firmware LMAC section 14 at 0x407d7000 size 21696
> iwx_init_fw_sec: firmware UMAC section 0 at 0x40333000 size 1656
> iwx_init_fw_sec: firmware UMAC section 1 at 0x407dd000 size 32768
> iwx_init_fw_sec: firmware UMAC section 2 at 0x407e5000 size 32768
> iwx_init_fw_sec: firmware UMAC section 3 at 0x407ed000 size 32768
> iwx_init_fw_sec: firmware UMAC section 4 at 0x407f5000 size 32768
> iwx_init_fw_sec: firmware UMAC section 5 at 0x407fd000 size 32768
> iwx_init_fw_sec: firmware UMAC section 6 at 0x40805000 size 32768
> iwx_init_fw_sec: firmware UMAC section 7 at 0x4080d000 size 32768
> iwx_init_fw_sec: firmware UMAC section 8 at 0x40815000 size 32768
> iwx_init_fw_sec: firmware UMAC section 9 at 0x4081d000 size 32768
> iwx_init_fw_sec: firmware UMAC section 10 at 0x40825000 size 32768
> iwx_init_fw_sec: firmware UMAC section 11 at 0x4082d000 size 32768
> iwx_init_fw_sec: firmware UMAC section 12 at 0x40835000 size 32768
> iwx_init_fw_sec: firmware UMAC section 13 at 0x4083d000 size 14408
> iwx_init_fw_sec: firmware UMAC section 14 at 0x40841000 size 4616
> iwx_init_fw_sec: firmware UMAC section 15 at 0x40843000 size 24756
> iwx_init_fw_sec: firmware paging section 0 at 0x40332000 size 1656
> iwx_init_fw_sec: firmware paging section 1 at 0x4084a000 size 32768
> iwx_init_fw_sec: firmware paging section 2 at 0x40852000 size 32768
> iwx_init_fw_sec: firmware paging section 3 at 0x4085a000 size 32768
> iwx_init_fw_sec: firmware paging section 4 at 0x40862000 size 32768
> iwx_init_fw_sec: firmware paging section 5 at 0x4086a000 size 32768
> iwx_init_fw_sec: firmware paging section 6 at 0x40872000 size 32768
> iwx_init_fw_sec: firmware paging section 7 at 0x4087a000 size 32768
> iwx_init_fw_sec: firmware paging section 8 at 0x40882000 size 32768
> iwx_init_fw_sec: firmware paging section 9 at 0x4088a000 size 32768
> iwx_init_fw_sec: firmware paging section 10 at 0x40892000 size 32768
> iwx_init_fw_sec: firmware paging section 11 at 0x4089a000 size 32768
> iwx_init_fw_sec: firmware paging 

Re: iwm fails with supported device under OpenBSD 7.3

2023-08-25 Thread Stefan Sperling
On Fri, Aug 25, 2023 at 12:14:45AM -0600, Matthew Webb wrote:
> Thank you Stefan for the discussion.
> 
> FWIW, I have now installed another operating system (Arch Linux) on my
> computer alongside OpenBSD 7.3, and the wireless network adapter works
> correctly and without issue, so it appears that it is not a hardware issue
> after all. I also updated the firmware on the laptop, but the issues with
> iwm0 remain.

Off-hand I don't have any idea what could be wrong and what can be done.

I doubt this is caused by iwm itself since iwm works well in many other
machines. More likely there is some problem in the layers iwm relies on.
Perhaps something is going wrong with PCI power management. But that is
just an uninformed guess of mine and I am not a PCI expert. As far as I
understand the reason for having to "acquire" the device in the driver
is that the chip may decide to go to sleep to save power and needs to be
woken up before it can be accessed again. This wake-up seems to be
randomly failing in your case.



Re: iwm fails with supported device under OpenBSD 7.3

2023-08-24 Thread Stefan Sperling
On Wed, Aug 23, 2023 at 07:25:48PM -0600, Matthew Webb wrote:
> I configured debug to dump the driver status, shown in messages attached.
> I've also included the pcidump for the network adapter.

Thanks.

> I found a potentially related discussion about the iwx driver:
> https://www.mail-archive.com/tech@openbsd.org/msg58119.html

This is unrelated.

> I considered the possibility of failed hardware, though the interface was
> working successfully under the previous Windows operating system.

'acquiring device failed' means the device is not responding to
even the most basic access to registers that should always be
accessible.

It's as if communication on the PCI bus has been cut for some
reason, of as if the device had lost power and is turned off.

The wifi driver is assuming to run with working hardware and a
working PCI bus. If those do not work then there is a fundamental
issue that the driver cannot fix.

> Aug 22 23:56:24 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:56:24 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:56:24 cogito /bsd: iwm0: timeout waiting for master
> Aug 22 23:57:12 cogito ntpd[29731]: DNS lookup tempfail
> Aug 22 23:57:26 cogito /bsd: iwm0: device timeout
> Aug 22 23:57:26 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:57:26 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:57:26 cogito /bsd: iwm0: timeout waiting for master
> Aug 22 23:58:22 cogito ntpd[29731]: DNS lookup tempfail
> Aug 22 23:58:28 cogito ntpd[29731]: DNS lookup tempfail
> Aug 22 23:58:44 cogito /bsd: iwm0: device timeout
> Aug 22 23:58:44 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:58:44 cogito /bsd: iwm0: acquiring device failed
> Aug 22 23:58:44 cogito /bsd: iwm0: timeout waiting for master
> Aug 22 23:58:44 cogito /bsd: iwm0: could not initialize hardware



Re: iwx: Firmware for Intel Wi-Fi 6 AX211 couldn't be loaded

2023-07-26 Thread Stefan Sperling
On Wed, Jul 26, 2023 at 02:45:29PM +, Miguel Landaeta wrote:
> >Synopsis:iwx: Firmware for Intel Wi-Fi 6 AX211 couldn't be loaded
> iwx0: using firmware iwx-so-a0-gf-a0-77

Can you tell me which firmware image the Linux kernel is loading for
your device?

Perhaps OpenBSD is misdetecting this particular device and ends up loading
the wrong firmware image. iwx device detection is very complicated because
PCI vendor/product IDs are shared between different types of devices.
So instead of straightforward matching of IDs the driver has to examine
a set of device-specific parameters and the Linux driver code which does
this is very hard to follow.



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Stefan Sperling
On Thu, Jun 29, 2023 at 11:31:33AM +0200, Martin Pieuchot wrote:
> On 29/06/23(Thu) 11:17, Stefan Sperling wrote:
> > 
> > iwm_intr already runs at IPL_NET. What else would be required?
> 
> Are we sure all accesses to `ic_tree' are run under KERNEL_LOCK()+splnet()?

A quick look through net80211 doesn't suggest any problem here.

I assume the KERNEL_LOCK is implied since interrupt handlers in wireless
drivers should not be marked MPSAFE. Userland has an ioctl to query the
nodes tree but cannot modify it. New nodes are only added during scans n
interrupt context under SPLNET. (Ignoring the timeouts in hostap mode for
now, since this crash is in iwm which does not support hostap mode).

Nodes can be removed in ieee80211_free_allnodes() via iwm_newstate_task()
and sc->sc_newstate, which runs in a dedicated task queue which is not
marked MPSAFE and thus runs with the kernel lock held.
ieee80211_free_allnodes() raises to IPL_NET before modyfing the tree.

While looking at this I spotted an unrelated problem: iwm/iwx add and delete
bgscan_done_task via different task queues. This means bgscan_done_task will
not be cancelled properly in iwm_newstate(). This bug originated in a commit
I made in on December 2 2021 and hasn't been noticed yet.

I would just move this task to the systq, where it gets deleted from. ok?

diff /usr/src
commit - 0c5a9349528207d3a937cab66be92baf3c29da40
path + /usr/src
blob - 1547120ea8f2bc861af585999863012e6ce6655c
file + sys/dev/pci/if_iwm.c
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -8574,7 +8574,7 @@ iwm_bgscan_done(struct ieee80211com *ic,
free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
sc->bgscan_unref_arg = arg;
sc->bgscan_unref_arg_size = arg_size;
-   iwm_add_task(sc, sc->sc_nswq, >bgscan_done_task);
+   iwm_add_task(sc, systq, >bgscan_done_task);
 }
 
 void
blob - 8aa8740bcf864ee470b7284da02d0c862e2bee99
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -7607,7 +7607,7 @@ iwx_bgscan_done(struct ieee80211com *ic,
free(sc->bgscan_unref_arg, M_DEVBUF, sc->bgscan_unref_arg_size);
sc->bgscan_unref_arg = arg;
sc->bgscan_unref_arg_size = arg_size;
-   iwx_add_task(sc, sc->sc_nswq, >bgscan_done_task);
+   iwx_add_task(sc, systq, >bgscan_done_task);
 }
 
 void



Re: panic: rw_enter: vmmaplk locking agaist myself

2023-06-29 Thread Stefan Sperling
On Thu, Jun 29, 2023 at 10:59:32AM +0200, Martin Pieuchot wrote:
> On 28/06/23(Wed) 15:47, Moritz Buhl wrote:
> > Dear bugs@,
> > 
> > with the following snapshot I had two panics on my x270 recently.
> 
> This is a bug in iwm(4) suggesting a missing SPL protection.
> 
> > sysctl kern.version
> > kern.version=OpenBSD 7.3-current (GENERIC.MP) #1256: Thu Jun 22 10:53:02 
> > MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Below are transcribed pictures of my laptop screen.
> > 
> > panic: rw_enter: vmmaplk locking against myself
> > Stopped at  db_enter+0x14:  popq%rbp
> > TID PID UID PRFLAGS PFLAGS  CPU COMMAND
> > *258766 67401   10000x212   0x400   0K  firefox
> >  465097 28019   0   0x14000 0x200   1   drmwq
> > db_enter () at db_enter+0x14
> > panic(820e78b0) at panic+0xc3
> > rw_enter(fd87449a0f60,2) at rw_enter+0x26f
> > uvmfault_lookup(800044cc3a30,0) at uvmfault_lookup+0x8a
> > uvm_fault_check(800044cc3a30, 800044cc3a68,800044cc3a90) at 
> > uvm_fault_check+0x36
> > uvm_fault(fd87449a0e78,ab6ed8ea000,0,1) at uvm_fault+0xfb
> > kpageflttrap(800044cc3bb0, ab6ed8ea088) at kpageflttrap+0x171
> > kerntrap(800044cc3bb0) at kerntrap+0x95
> > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b
> > _rb_min(823f89a8,80278060) at _rb_min+0x23
> > ieee80211_clean_inactive_nodes(80277048,a) at 
> > ieee80211_clean_inactive_nodes+0x4c
> 
> Looks like a corruption in RB-tree used inside 
> ieee80211_clean_inactive_nodes().
> 
> Since this is coming from interrupt handler it suggest a missing spl
> dance.

iwm_intr already runs at IPL_NET. What else would be required?



Re: iwx0 fatal firmware error - Status: 0x239, OpenBSD 7.3 #1122 2023-March-19

2023-04-03 Thread Stefan Sperling
On Mon, Apr 03, 2023 at 06:24:38PM +, miko...@kucharski.name wrote:
> Apr  3 16:42:56 x1c /bsd: iwx0: begin background scan

> Apr  3 16:42:57 x1c /bsd: iwx0: Start UMAC Error Log Dump:
> Apr  3 16:42:57 x1c /bsd: iwx0: Status: 0x239, count: 7
> Apr  3 16:42:57 x1c /bsd: iwx0: 0x20002806 | ADVANCED_SYSASSERT

This particular sysassert code (0x20002806) is a known issue since
the -77 firmware update. I don't know what is triggering this error
and so far I have failed to reproduce it. Apparently something is
wrong with the scan command but I don't know what is wrong exactly.

As a workaround, does it help to disable background scanning by
temporarily hard-coding the MAC address of the working 2GHz AP?

  ifconfig iwx0 bssid 04:20:84:31:dd:ab

To use any AP again, run: ifconfig iwx0 -bssid



Re: iwx(4) panic on resume after hibernation

2023-02-13 Thread Stefan Sperling
On Sat, Feb 11, 2023 at 06:32:31PM +0300, Vitaliy Makkoveev wrote:
> Unfortunately the diff doesn't help. I attached photos with debug
> information and panic report.

Based on the command code in the firmware trace the error seems to be
caused by the driver sending this command:

#define IWX_ADD_STA_KEY 0x17

It seems the set_key task is running while the driver is in INIT state?
That is certainly not intended, and it is unclear why this is happening.

What is the effect of this diff? Do you see the printfs added here?

diff /usr/src
commit - 272fae57df2955410d97e29c5923cd4948623da1
path + /usr/src
blob - fe1bee0e59b08e6de8b7e3c00e20b930a29a6bb1
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -8554,12 +8554,18 @@ iwx_setkey_task(void *arg)
struct iwx_softc *sc = arg;
struct iwx_setkey_task_arg *a;
int err = 0, s = splnet();
+   enum ieee80211_state state = sc->sc_ic.ic_state;
 
-   while (sc->setkey_nkeys > 0) {
-   if (err || (sc->sc_flags & IWX_FLAG_SHUTDOWN))
-   break;
+   printf("%s: in state %s\n", __func__, ieee80211_state_name[state]);
+   while (state == IEEE80211_S_RUN && sc->setkey_nkeys > 0 &&
+   err == 0 && (sc->sc_flags & IWX_FLAG_SHUTDOWN) == 0) {
a = >setkey_arg[sc->setkey_tail];
err = iwx_add_sta_key(sc, a->sta_id, a->ni, a->k);
+   if (err) {
+   printf("%s: could not set %s key (error %d)\n",
+   DEVNAME(sc), (a->k->k_flags & IEEE80211_KEY_GROUP) ?
+   "group" : "pairwise", err);
+   }
a->sta_id = 0;
a->ni = NULL;
a->k = NULL;
@@ -8568,6 +8574,10 @@ iwx_setkey_task(void *arg)
sc->setkey_nkeys--;
}
 
+   /* Reset everything if key installation failed. */
+   if (err && (sc->sc_flags & IWX_FLAG_SHUTDOWN) == 0)
+   task_add(systq, >init_task);
+
refcnt_rele_wake(>task_refs);
splx(s);
 }



Re: iwx(4) panic on resume after hibernation

2023-02-11 Thread Stefan Sperling
On Sat, Feb 11, 2023 at 03:40:48PM +0300, Vitaliy Makkoveev wrote:
> >Synopsis:iwx(4) panic on resume after hibernation
> >Category:hibernate/resume
> >Environment:
>   System  : OpenBSD 7.2
>   Details : OpenBSD 7.2-current (GENERIC.MP) #1021: Sun Feb  5 
> 09:52:50 MST 2023
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   "dram->paging == NULL" assertion triggered at line 635 in
>   dev/pci/if_iwx.c when resuming after hibernate. The information
>   was not stored in dmesg, so photo also attached. Output contains
>   the mess, but report readable, providing as-is.
> >How-To-Repeat:
>   ZZZ then resume
> >Fix:
>   unknown

I suppose this could happen if you did not go through iwx_stop_device()
at suspend time for some reason. That would usually free the pointer and
set it back to NULL. It is unclear to me how skipping this step during
suspend is possible.

There is a "fatal firmware error" visible in the text before the KASSERT
failed.  Can you try to trigger this again with 'ifconfig iwx0 debug'
enabled before hibernate? A firmware error trace might tell us a bit more
about the issue.

And if you can, please also modify the driver to log all calls to
iwx_stop_device() and iwx_init_task() to dmesg.
Maybe this firmware error triggered the init task to run during the shutdown
sequence and we ended up starting the device back up while we were suspending?
If so the diff below might help.

diff /usr/src
commit - 0da56a05372165a5aa6f5376fe481a3a9f3326d7
path + /usr/src
blob - 30f9dd2bd76e79992b9b6dd2b1975b7f84d78343
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -10944,7 +10944,8 @@ iwx_init_task(void *arg1)
int fatal = (sc->sc_flags & (IWX_FLAG_HW_ERR | IWX_FLAG_RFKILL));
 
rw_enter_write(>ioctl_rwl);
-   if (generation != sc->sc_generation) {
+   if (generation != sc->sc_generation ||
+   (sc->sc_flags & IWX_FLAG_SHUTDOWN)) {
rw_exit(>ioctl_rwl);
splx(s);
return;



Re: WiFi iwm0 crash

2023-01-02 Thread Stefan Sperling
On Sun, Jan 01, 2023 at 05:54:55PM -0800, Epix Gamor wrote:
> # ISSUE
> OpenBSD crashes when I try to do my WiFi.
> 
> # WHAT I DID TO GET THE ISSUE
> I installed a fresh copy of OpenBSD on my machine and the first thing
> I did was to permit doas and then I want to set up my WiFi.
> I’m setting up my WiFi and first I do:
> doas ifconfig iwm0 nwid "name" wpakey "password"
> And when I do the following it seemingly crashes:
> doas dhclient iwm0
> 
> # OTHER INFO
> I'm using a Thinkpad Yoga 370, and also had this same issue when
> tested on a HP Elitebook x360 1030 G2.
> I'm illiterate when it comes to technical computer things, so please pardon 
> me.
> I did not have any issues doing WiFi on my Thinkpad T420s (I assume
> this was because I'm pretty sure it wasn't using iwm0).
> 
> # IMAGES
> I've attached them to this email.

Thanks for your report. This crash has been fixed in -current not very
long ago, see https://marc.info/?l=openbsd-cvs=167119815124868=2

The crash probably triggers because you do not have the iwm-firmware
package installed.
Connect to the internet in some way without iwm, and run fw_update; this
should add firmware for iwm.  Now iwm should work.



Re: iwx(4) randomly stopped connecting to Wi-Fi

2022-12-27 Thread Stefan Sperling
On Tue, Dec 27, 2022 at 06:50:55PM +, Jeremy Potter wrote:
> One clue that I do have is that every time I restarted my Wi-Fi, the kernel 
> log
> printed something along the lines of:
> 
>   iwx0: unhandled firmware response 0x5f4/0x200c

Response 0x5f4 is a "monitor notification" sent by firmware.
This seems to be related to 40MHz-intolerance detection.
The Linux driver forces a switch to 20MHz channel width when this
event occurs, if I understand their code correctly.

When 40MHz wide channels were added with the 802.11n standard, a
provision was included which allows any device to prevent use of
40MHz channels in its vicinity. This is done by setting the
"40MHz intolerant" flag in advertised 802.11n capabilities.
Conforming devices are supposed to stop using 40MHz channels if a frame
is received which contains this flag. Intel firmware seems to be keen
on enforcing this by refusing to work if the driver does not comply.

The OpenBSD driver does not handle this notification yet.
I was hoping that relying on the AP to switch 40MHz channels off would
be good enough, but apparently it is not. To make our driver handle this
situation gracefully we would have to disable our use of 40MHz (and 80MHz)
channels if firmware reports this event, and somehow decide to turn wide
channels back on later (or stay stuck on 20MHz width until the next reboot,
which is a huge performance hit).

Given the information above, are you able to reproduce this problem?
Is there a device under your control which is causing it? I do not have a
test setup for this issue, though I could probably set something up by
setting the 40MHz-intolerant bit artificially in one of my APs.
But if you were able to easily verify a patch which attempts to address
this, that would save me some time.



Re: rt_ifa_del NULL deref

2022-11-15 Thread Stefan Sperling
On Tue, Nov 15, 2022 at 03:07:05PM +0100, Leah Neukirchen wrote:
> 
> I hit the same issue on a 7.2-RELEASE system, which was idle and had
> roughly 3 weeks of uptime.
> 
> Stopped at rt_ifa_del+0x39: movb 0x1b6(%rax),%bl
> Same backtrace as in parent message.
> 
> The system is virtualized on QEMU/KVM 7.0 on Linux x86_64, has networking
> over a bridge where radvd 2.19 announces a prefix.  The same setup has
> been running for years with older OpenBSD versions, without issues.

FWIW, I have found that disabling IPv6 autoconf reliably avoids this.

I have also seen a related crash when running the command below. Which
means that it's not just the nd6 expiry task affected by this issue.

It is not yet known where the actual race is. Help appreciated.

# ifconfig vio0 -inet6 autoconf

login: kernel: protection fault trap, code=0
Stopped at  rt_ifa_del+0x39:movb0x1b6(%rax),%bl
ddb{2}> bt
rt_ifa_del(808a0d00,800100,dead0009deadbeef,0) at rt_ifa_del+0x39
in6_unlink_ifa(808a0d00,804d72a8) at in6_unlink_ifa+0xae
in6_purgeaddr(808a0d00) at in6_purgeaddr+0x127
in6_ifdetach(804d72a8) at in6_ifdetach+0x19e
ifioctl(fd8782bf95b8,801169ac,800022edac90,800022e24fc8) at ifioctl
+0xdcc
soo_ioctl(fd877fc2ef00,801169ac,800022edac90,800022e24fc8) at soo_i
octl+0x171
sys_ioctl(800022e24fc8,800022edada0,800022edae00) at sys_ioctl+0x2c
4
syscall(800022edae70) at syscall+0x384
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7e1900, count: -9
ddb{2}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
*888907233  11006  0  7 0x3ifconfig



Re: Ldomctl generates defective config after OBSD 6.3 on T1000.

2022-11-12 Thread Stefan Sperling
On Fri, Nov 11, 2022 at 06:46:31PM +, Andrew Grillet wrote:
> I was unable to compile and test the diff - compile times exceeded a week
> on a T1000, and
> then I took ill.
> 
> I now confirm that this problems is present on 7.2 and appears to affect my
> T5120 as well.
> I can create configs with OBSD 6.3 which are bootable on T1000, T2000 and
> T5120.
> All later releases, including 7.2 corrupt the device tree in a similar way.

I can confirm that ldoms are broken on T1000 at least. OpenBSD 7.1 did run
fine on my T1000, hosting several ldom guests. Though I don't remember when
I last updated the firmware's ldom config. Once I regenerated the config it
came back in a state where ldom guests would no longer start up.

I briefly considered debugging it but in the end I gave up on running this
machine. It swallows almost 150W and yields worse performance than a raspberry
pi in return. I've moved my VMs to vmm(4) on amd64 instead.



Re: re(4) drivers not working

2022-09-16 Thread Stefan Sperling
On Fri, Sep 16, 2022 at 09:58:44AM +0200, Adam Szewczyk wrote:
> My mistake in creating a bug for wireless it should be iwm (copied wrong
> file name in a hurry). Is that are you wrote about iwx also true for iwm?

At this low level the drivers are equivalent.

> Sorry for the trouble. I just want to figured it out which project have, a
> bug and create there a ticket. It was my first contact with BSD also so I
> don't know nothing about the internals.

It can be difficult when people send you around between each other's
projects, I understand that. And one can never know for sure, there
are so many layers involved that it is impossible to tell what's
wrong without knowing details. PCI-passthrough is complicated.
The best way to make progress is trying to eliminate possibile root
causes for the issue. Have you tried PCI passthrough with KVM yet,
just to see if a different hypervisor works fine? And have you tried
on a different hardware configuration (different CPU/BIOS)? Are you
sure you applied all the necessary BIOS tweaks for PCI passthrough
to actually work?

In any case, without solid evidence that something is wrong in the
OpenBSD drivers, I don't expect anyone here will have motivation to hunt
for a bug. Our developers don't always have time to help chase problems
down from the very top (error message) to the very bottom (root cause).
We need more information and clear evidence that other possible
root causes have already been eliminated.
Based on past experience with such bug reports, when it works just fine
on bare metal then there is usually a hypervisor or BIOS problem.



Re: re(4) drivers not working

2022-09-16 Thread Stefan Sperling
On Fri, Sep 16, 2022 at 08:27:59AM +0200, Adam Szewczyk wrote:
> Checked on actual hardware it seems to work fine. So I report it also to
> Qubes and Xen Project, but marmarek (head of qubes development team)
> suggest that it can be a problem with driver itself:
> "I'm not sure if that's relevant, but a common cause is incorrect MSI /
> MSI-X detection/support by the driver. HVM in Qubes do not support MSI-X,
> but do support MSI. MSI-X, even if the device really supports it, is masked
> from PCI capabilities. Driver should fallback to MSI (or INTx) if MSI-X
> fails to setup (or isn't there at all), but many drivers have this path
> buggy, since it isn't exercised in most common (native) scenario."

re(4) only uses MSI, it does not ever even try MSI-X. As can be seen here:
https://github.com/openbsd/src/blob/master/sys/dev/pci/if_re_pci.c#L161

iwx(4) first tries MSI-X, then falls back on MSI:
https://github.com/openbsd/src/blob/master/sys/dev/pci/if_iwx.c#L10573
Your iwx firmare load failure looks like a DMA problem to me.

I don't see any actionable information in your bug reports as far
as fixing our drivers goes. Sorry.



Re: rt_ifa_del NULL deref

2022-09-05 Thread Stefan Sperling
On Sun, Sep 04, 2022 at 07:34:04PM +0300, Vitaliy Makkoveev wrote:
> I suspect missing netlock in the config path or within timeout handler,
> but exclusive netlock assertion makes sense.
> 
> Also, we could have "double protection" here, like we have for
> `if_list'. I mean we hold both kernel and net locks while we modify
> `if_list', but while we do read access, we hold only one of them. I
> don't like this, but my diffs for `if_list' were rejected.

My box is running with this patch now. I'll report back when something
shows up. Thanks!



Re: rt_ifa_del NULL deref

2022-09-04 Thread Stefan Sperling
On Sat, Aug 27, 2022 at 11:32:24PM +0300, Vitaliy Makkoveev wrote:
> > On 27 Aug 2022, at 22:03, Alexander Bluhm  wrote:
> > 
> > On Sat, Aug 27, 2022 at 03:14:15AM +0300, Vitaliy Makkoveev wrote:
> >>> On 27 Aug 2022, at 00:04, Alexander Bluhm  wrote:
> >>> 
> >>> Anyone willing to test or ok this?
> >>> 
> >> 
> >> This fixes weird `ifa??? refcounting. I like this.
> >> 
> >> Could the ifaref() and ifafree() names use the same notation? Like
> >> ifaref() and ifarele() or ifaget() and ifafree() or something else?
> > 
> > Refcount naming is very inconsistent.
> > 
> > ifget(), ifput(), pf_state_key_ref(), pf_state_key_unref(), tdb_ref(),
> > tdb_unref(), tdb_delete(), tdb_free(), vxlan_take(), vxlan_rele()
> > all work in subtle different ways.
> > 
> > I want to keep ifafree() as the name is established and called from
> > many places.  And giving ifaref() another name makes it different
> > but not better.
> > 
> > It would be easy to change something but hard to make it consistent.
> > So I prefer to leave the diff as it is.
> > 
> > bluhm
> 
> I have no objections to commit this diff. 
 
The diff has been committed but the problem remains:

OpenBSD 7.2-beta (GENERIC.MP) #2: Thu Sep  1 18:54:34 CEST 2022 
  
s...@bev.stsp.name:/usr/src/sys/arch/amd64/compile/GENERIC.MP

login: kernel: protection fault trap, code=0
Stopped at  rt_ifa_del+0x39:movb0x1b6(%rax),%bl
ddb{3}> bt
rt_ifa_del(80496c00,800100,dead0009dead4110,0) at rt_ifa_del+0x39
in6_unlink_ifa(80496c00,800da2a8) at in6_unlink_ifa+0xae
in6_purgeaddr(80496c00) at in6_purgeaddr+0x127
nd6_expire(0) at nd6_expire+0x96
taskq_thread(8002c080) at taskq_thread+0x100
end trace frame: 0x0, count: -5
ddb{3}> show struct ifaddr 0x80496c00
struct ifaddr at 0x80496c00 (64 bytes) {ifa_addr = (struct sockaddr *)0
xdead0009dead4110, ifa_dstaddr = (struct sockaddr *)0x4002e6f6e3c87f50, ifa_net
mask = (struct sockaddr *)0xdead4110dead4110, ifa_ifp = (struct ifnet *)0xdead4
110dead4110, ifa_list = {tqe_next = (struct ifaddr *)0xdead4110dead4110, tqe_pr
ev = 0xdead4110dead4110}, ifa_flags = 0xdead4110, ifa_refcnt = {r_refs = 0xdead
4110, r_traceidx = 0xdead4110}, ifa_metric = 0xdead4110}
ddb{3}> 



Re: rt_ifa_del NULL deref

2022-08-24 Thread Stefan Sperling
On Wed, Aug 24, 2022 at 03:14:35PM +0200, Alexander Bluhm wrote:
> What are you doing with the machine?  Forwarding, IPv6, Stress test?
> Do you have network interfaces with multiple network queues?

It is a build box, which also serves snapshots over http. I am the only
user of the httpd server so there is very little network traffic.
The network interface is vio(4), the host is KVM.

Best I can tell is that the machine was idle at the time. There were
some network connectivity issues which could have triggered something,
but I really don't know.



Re: rt_ifa_del NULL deref

2022-08-23 Thread Stefan Sperling
On Tue, Aug 23, 2022 at 02:16:27PM +0200, Alexander Bluhm wrote:
> On Tue, Aug 23, 2022 at 12:23:05PM +0200, Stefan Sperling wrote:
> > On Tue, Aug 23, 2022 at 11:43:22AM +0200, Alexander Bluhm wrote:
> > > On Tue, Aug 23, 2022 at 10:15:22AM +0200, Stefan Sperling wrote:
> > > > I found one of my amd64 systems running -current, built on 12th of
> > > > August, has crashed as follows.
> > >
> > > I there any chance that the kernel sources are between these commits?
> > > August 12th does not fit exactly, do you remember when you did the
> > > checkout?  Or is it a snapshot kernel?
> >
> > Do you want to me provide anything else from ddb?
> 
> The ususal:
> 
> show panic
> show registers
> trace
> ps
> show struct ifaddr 0x804e9400
> show all routes
> traces from other cpu
> 
> I fear that there will be no really usefull info.  It looks like a
> use after free.  When removing the route, the nd6 timer should have
> been deleted.

Makes sense. Anyway, here is the output:

ddb{2}> show panic
the kernel did not panic
ddb{2}> show registers
rdi   0x804e9400
rsi 0x800100acpi_pdirpa+0x7ebf68
rbp   0x800022d4cea0
rbx0
rdx   0xdeaf0009deafbead
rcx0
rax   0xdeafbeaddeafbead
r8 0
r90x82125101arcturus_feature_mask_map+0x2d1
r10  0x8
r11   0xeb6a1cd605457cff
r12   0xdeaf0009deafbead
r130
r14   0x804e9400
r15 0x800100acpi_pdirpa+0x7ebf68
rip   0x813ef4f9rt_ifa_del+0x39
cs   0x8
rflags   0x10286__ALIGN_SIZE+0xf286
rsp   0x800022d4cd60
ss  0x10
rt_ifa_del+0x39:movb0x1be(%rax),%bl
ddb{2}> trace
rt_ifa_del(804e9400,800100,deaf0009deafbead,0) at rt_ifa_del+0x39
in6_unlink_ifa(804e9400,800da2a8) at in6_unlink_ifa+0xae
in6_purgeaddr(804e9400) at in6_purgeaddr+0x127
nd6_expire(0) at nd6_expire+0x96
taskq_thread(8002c080) at taskq_thread+0x100
end trace frame: 0x0, count: -5
ddb{2}> ps
   PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
 85757  212591  1  0  30x100083  ttyin getty
 13844  194231  1  0  30x100083  ttyin getty
 52351  462696  1  0  30x100083  ttyin getty
 11970  274173  1  0  30x100083  ttyin getty
 55817  204799  1  0  30x100083  ttyin getty
 55147  459952  1  0  30x100083  ttyin getty
 97104  507732  1  0  30x100098  kqreadcron
 24934  108341  1 99  3   0x1100090  kqreadsndiod
 24851  158785  1110  30x100090  kqreadsndiod
 79580  437931  1  0  30x100080  kqreadhttpd
 90912  365815  1 67  3   0x1100092  kqreadhttpd
 11193   62187  1 67  3   0x1100092  kqreadhttpd
 11994   82021  1 67  3   0x1100092  kqreadhttpd
  6336  127367  1 67  3   0x1100092  kqreadhttpd
 46164  440393  1 67  3   0x1100092  kqreadhttpd
  5307   16141  1 67  3   0x1100092  kqreadhttpd
  3246  133657  1 67  3   0x1100092  kqreadhttpd
 67433  170750  1 67  3   0x1100092  kqreadhttpd
 71968  499680  1 67  3   0x1100092  kqreadhttpd
 56363   70989  1 67  3   0x1100092  kqreadhttpd
 66759  513771  1 67  3   0x1100092  kqreadhttpd
 53393   89252  1 67  3   0x1100092  kqreadhttpd
 65858  323071  1 67  3   0x1100092  kqreadhttpd
 66322  146077  1 67  3   0x1100092  kqreadhttpd
 14621  176429  1 67  3   0x1100092  kqreadhttpd
 74712  218180  1 67  3   0x1100092  kqreadhttpd
 61865  208410  1 67  3   0x1100092  kqreadhttpd
 67924   73387  1 67  3   0x1100092  kqreadhttpd
 62026  199422  1 67  3   0x1100092  kqreadhttpd
 85443  340524  1 67  3   0x1100092  kqreadhttpd
 95521   46441  1 67  3   0x1100092  kqreadhttpd
   375  324584  1 67  3   0x1100092  kqreadhttpd
 18765  359197  1 67  3   0x1100092  kqreadhttpd
 60496  158853  1 67  3   0x1100092  kqreadhttpd
  70095302  1 67  3   0x1100092  kqreadhttpd
 36669   77305  1 67  3   0x1100092  kqreadhttpd
  1569  118333  1 67  3   0x1100092  kqreadhttpd
 90776  452570  1 67  3   0x1100092  kqreadhttpd

Re: rt_ifa_del NULL deref

2022-08-23 Thread Stefan Sperling
On Tue, Aug 23, 2022 at 11:43:22AM +0200, Alexander Bluhm wrote:
> On Tue, Aug 23, 2022 at 10:15:22AM +0200, Stefan Sperling wrote:
> > I found one of my amd64 systems running -current, built on 12th of
> > August, has crashed as follows.
> 
> I there any chance that the kernel sources are between these commits?
> August 12th does not fit exactly, do you remember when you did the
> checkout?  Or is it a snapshot kernel?

Do you want to me provide anything else from ddb?
I can tell you exactly what sources I built once I reboot the machine.

I would have synced the src repository from github on August 12 just
before building this kernel. Both changes you mentioned should already
be contained in this kernel; the latest was committed on August 9.
So I don't think my system hit that particular issue.

There should be small wifi driver patches in there, as well as my
raid1c boot diffs (since committed). No local changes apart from this.



rt_ifa_del NULL deref

2022-08-23 Thread Stefan Sperling
I found one of my amd64 systems running -current, built on 12th of
August, has crashed as follows.

I am not sure if this is still relevant; please excuse the noise if
this has already been found and fixed.

kernel: protection fault trap, code=0
Stopped at  rt_ifa_del+0x39:movb0x1be(%rax),%bl 
  
ddb{2}> bt  
  
rt_ifa_del(804e9400,800100,deaf0009deafbead,0) at rt_ifa_del+0x39   
  
in6_unlink_ifa(804e9400,800da2a8) at in6_unlink_ifa+0xae
  
in6_purgeaddr(804e9400) at in6_purgeaddr+0x127  
  
nd6_expire(0) at nd6_expire+0x96
  
taskq_thread(8002c080) at taskq_thread+0x100
  
end trace frame: 0x0, count: -5 



Re: [patch] 802.11 printing akm and cipher suite lists in tcpdump

2022-07-22 Thread Stefan Sperling
On Fri, Jul 22, 2022 at 10:25:47PM +0300, Mikhail wrote:
> Moving to bugs@, hoping to get some attention.

Fixed in-tree now, thank you.
I've also fixed another instance of the same i + j mistake in the PMKID loop.



Re: Possible fix for bwfm driver not working after resume with A/C charger connected

2022-07-19 Thread Stefan Sperling
On Tue, Jul 19, 2022 at 05:01:48PM +0200, Patrick Wildt wrote:
> No, it's not inverted. If bwfm_pci_send_mb_data() fails we want to
> re-init the device, that's what both Linux and OpenBSD are doing.

Right. I came to the same conclusion while taking a brief look.

> Better questions are: why is bwfm_pci_send_mb_data() failing?  Why is
> there a pending mb command?

Well, it looks like bwfm_pci_send_mb_data() does in fact succeed.
The incorrect patch effectively treats success as failure, and thus
demonstrates that our current code does not detect an error there.

The failure is elsewhere: bwfm_fwvar_set_int() in bwfm_init() is failing.
Probably because the device isn't resumed yet even though it should be.

> And: why doesn't cleanup and reactivate work?

As I understand the bug report, by treating bwfm_pci_send_mb_data() success
as a failure case, we end up running bwfm_cleanup() and bwfm_pci_cleanup(),
and now the device resumes just fine.

So the Linux code we've ported does somehow manage to resume the device
from a weird state. But we fail to do that. I don't know why, but is that
not the real question we need to ask?

Could we always do a full device reset as a workaround?
Perhaps that is what we should be doing anyway? Maybe the Linux way of
resuming this device is somehow incompatible with the way our system works?



Re: Atheros AR9300 Wi-Fi controller not yet supported

2022-07-09 Thread Stefan Sperling
On Sat, Jul 09, 2022 at 10:14:38AM -0400, jdp_ man wrote:
> I recently discovered OpenBSD and eventually tried it in a physical PC.
> The PC is an Optiplex 790 with an add-in WiFi card. i5 2400 w/o GPU.
> The card internally is a Atheros AR9300, and it seems support is gone,
> as of this commit which lists issues with it, from 2014:
> https://github.com/openbsd/src/commit/a8c1d01a39b2bf4debd0358ce66f99d4bc391432
> 
> I am open to learning how to build and test OpenBSD in order to fix,
> but if any of you want to try this yourself I'll try to find a listing
> online
> for sale of this specific WiFi card.
> 
> specific dmesg line for this device:
> "Atheros AR9300" rev 0x01 at pci2 dev 0 function 0 not configured

This is still work-in-progress.
In 2017 I managed to get Rx working on an AR9380 device, and all the
necessary changes for this are in the tree.

But Tx fails. I have given up on trying to figure out why.
I would welcome patches to fix this.



Re: iwn issues after updating to yesterday's snapshot

2022-06-19 Thread Stefan Sperling
On Sun, Jun 19, 2022 at 02:43:20PM +, Lucas wrote:
> Hello,
> 
> As the subject reads, yesterday I update my Lenovos T420 and X230.
> During sysupgrade autoupgrade and afterwards I started getting firmware
> errors and wifi became unusable. I reached stsp@ off-list to know how to
> provide the most information possible and he shared [1] with me. I
> applied the patch in [2] and I have 2 "traces" of different AP, and I'll
> get another two during the rest of the day. AP1 is a residential ISP,
> AP2 is a smartphone hotspot. The trace for AP1 includes dmesg.
> 
> Also, to give it a shot, I reverted the last changes to if_iwn.c, Git
> commit 03baa64391cc86175b030d132ae900417665b92f, CVS r1.259. Running
> this kernel, wifi works again on my *X230*. Both laptops report the card
> as 
> 
> iwn0 at pci2 dev 0 function 0 "Intel Centrino Advanced-N 6205" rev 0x34: msi, 
> MIMO 2T2R, MoW
> 
> so I guess it'll also work in the T420, but haven't tried it yet.

I have reverted the change.

After more off-list discussion with Christian Schulte it is clear
that the change wasn't actually helping with the problem anyway.
We still don't know what is happening there, just that the issue
can be avoided by moving APs to a channel other than 13.

Sorry about the temporary breakage.



Re: Ralink RT2790 - connection problems

2022-04-21 Thread Stefan Sperling
On Thu, Apr 21, 2022 at 10:31:46PM +0200, Stefan Sperling wrote:
> On Thu, Apr 21, 2022 at 09:15:57PM +0200, Stefan Sperling wrote:
> > On Thu, Apr 21, 2022 at 08:58:48PM +0200, Sven Wolf wrote:
> > > But when I build a new kernel with the sources from 2022-03-15 everything 
> > > is
> > > fine.
> > > 
> > > Maybe this commit causes this behaviour/bug
> > > https://marc.info/?l=openbsd-cvs=16472610775=2
> > 
> > Looks like a bug in ral(4) (uninitialized variable) which has been
> > exposed by the above commit. Does the patch below help?
> > 
> > If it does then I will do a sweep of all wifi drivers for similar problems.
> > Sorry for not catching this earlier. This could have been caught had it
> > occurred to me to check for any uninitialized use of this struct when
> > I added a new field...
> 
> I did take a look already just in case, and it's not looking good...
> This fixes the places I've found so far:

Sorry, the previous diff missed the ral(4) fix I sent earlier.
A better version of this fix is included here (thanks to miod for off-list
feedback).
 
diff b3dff8d102fea35ec827f7e82565722dc4185589 
d5aaa0e75d191916f6c8da91b842079511d38fd3
blob - 9c70f4dc5e96467bb4100e022ba59e918797e85f
blob + c0c96f2f5614858df29f49680d8349832af8d55a
--- sys/dev/ic/acx.c
+++ sys/dev/ic/acx.c
@@ -1354,7 +1354,7 @@ acx_rxeof(struct acx_softc *sc)
sc->chip_rxbuf_exhdr);
wh = mtod(m, struct ieee80211_frame *);
 
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if ((wh->i_fc[1] & IEEE80211_FC1_WEP) &&
sc->chip_hw_crypt) {
/* Short circuit software WEP */
blob - 6208492c94c2af3d15e158e8329f881565efd232
blob + 28db8e32130a83eeb9e057cfaed6eb9d8d625428
--- sys/dev/ic/an.c
+++ sys/dev/ic/an.c
@@ -462,7 +462,7 @@ an_rxeof(struct an_softc *sc)
 #endif /* NBPFILTER > 0 */
 
wh = mtod(m, struct ieee80211_frame *);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if (wh->i_fc[1] & IEEE80211_FC1_WEP) {
/*
 * WEP is decrypted by hardware. Clear WEP bit
blob - e9ff551127c7e937b52f99577ade8bc5dde91763
blob + aafd14161600420dfd90ff3f2ecc093e11f5c33f
--- sys/dev/ic/ar5008.c
+++ sys/dev/ic/ar5008.c
@@ -1039,7 +1039,7 @@ ar5008_rx_process(struct athn_softc *sc, struct mbuf_l
m_adj(m, -IEEE80211_CRC_LEN);
 
/* Send the frame to the 802.11 layer. */
-   rxi.rxi_flags = 0;  /* XXX */
+   memset(, 0, sizeof(rxi));
rxi.rxi_rssi = MS(ds->ds_status4, AR_RXS4_RSSI_COMBINED);
rxi.rxi_rssi += AR_DEFAULT_NOISE_FLOOR;
rxi.rxi_tstamp = ds->ds_status2;
blob - 5aa99be1508ee4b893a50adf89f8c83eaad9e94b
blob + e45a4441a62255fb4ef4a386112a477a2f5b58ed
--- sys/dev/ic/ar9003.c
+++ sys/dev/ic/ar9003.c
@@ -1026,7 +1026,7 @@ ar9003_rx_process(struct athn_softc *sc, int qid, stru
m_adj(m, -IEEE80211_CRC_LEN);
 
/* Send the frame to the 802.11 layer. */
-   rxi.rxi_flags = 0;  /* XXX */
+   memset(, 0, sizeof(rxi));
rxi.rxi_rssi = MS(ds->ds_status5, AR_RXS5_RSSI_COMBINED);
rxi.rxi_tstamp = ds->ds_status3;
ieee80211_inputm(ifp, m, ni, , ml);
blob - 9a640630ecdd5e55125bae10cbd31665bf7ce9e6
blob + a43ad3f9706addf3886a8ce74c63adb50c58589a
--- sys/dev/ic/ath.c
+++ sys/dev/ic/ath.c
@@ -1936,7 +1936,7 @@ ath_rx_proc(void *arg, int npending)
 #endif
m_adj(m, -IEEE80211_CRC_LEN);
wh = mtod(m, struct ieee80211_frame *);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if (!ath_softcrypto && (wh->i_fc[1] & IEEE80211_FC1_WEP)) {
/*
 * WEP is decrypted by hardware. Clear WEP bit
blob - 49b4089c69fe17b0215d514d0b3b1215a87b69b8
blob + 04b4f0d393e7eb142d707c76090668df556bf845
--- sys/dev/ic/atw.c
+++ sys/dev/ic/atw.c
@@ -3175,7 +3175,7 @@ atw_rxintr(struct atw_softc *sc)
 
wh = mtod(m, struct ieee80211_frame *);
ni = ieee80211_find_rxnode(ic, wh);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
 #if 0
if (atw_hw_decrypted(sc, wh)) {
wh->i_fc[1] &= ~IEEE80211_FC1_WEP;
@@ -3183,7 +3183,6 @@ atw_rxintr(struct atw_softc *sc)
}
 #endif
rxi.rxi_rssi = (int)rssi;
-   rxi.rxi_tstamp = 0;
ieee80211_inputm(ifp, m, ni, , );
/*
 * The frame may have caused the node to be marked for
blob - c245bcdac6953e6e88059fb1fccdc319b412ccfa
blob + d4db3d9384097401f7b48072949c432ead0a4ceb
--- sys/dev/ic/bwfm.c
+++ sys/dev/ic/bwfm.c
@@ -2439,9 +2439,7 

Re: Ralink RT2790 - connection problems

2022-04-21 Thread Stefan Sperling
On Thu, Apr 21, 2022 at 09:15:57PM +0200, Stefan Sperling wrote:
> On Thu, Apr 21, 2022 at 08:58:48PM +0200, Sven Wolf wrote:
> > But when I build a new kernel with the sources from 2022-03-15 everything is
> > fine.
> > 
> > Maybe this commit causes this behaviour/bug
> > https://marc.info/?l=openbsd-cvs=16472610775=2
> 
> Looks like a bug in ral(4) (uninitialized variable) which has been
> exposed by the above commit. Does the patch below help?
> 
> If it does then I will do a sweep of all wifi drivers for similar problems.
> Sorry for not catching this earlier. This could have been caught had it
> occurred to me to check for any uninitialized use of this struct when
> I added a new field...

I did take a look already just in case, and it's not looking good...
This fixes the places I've found so far:

diff 78339108684c9eff64beceb07db3a4411c4e9aee /usr/src
blob - 9c70f4dc5e96467bb4100e022ba59e918797e85f
file + sys/dev/ic/acx.c
--- sys/dev/ic/acx.c
+++ sys/dev/ic/acx.c
@@ -1354,7 +1354,7 @@ acx_rxeof(struct acx_softc *sc)
sc->chip_rxbuf_exhdr);
wh = mtod(m, struct ieee80211_frame *);
 
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if ((wh->i_fc[1] & IEEE80211_FC1_WEP) &&
sc->chip_hw_crypt) {
/* Short circuit software WEP */
blob - 6208492c94c2af3d15e158e8329f881565efd232
file + sys/dev/ic/an.c
--- sys/dev/ic/an.c
+++ sys/dev/ic/an.c
@@ -462,7 +462,7 @@ an_rxeof(struct an_softc *sc)
 #endif /* NBPFILTER > 0 */
 
wh = mtod(m, struct ieee80211_frame *);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if (wh->i_fc[1] & IEEE80211_FC1_WEP) {
/*
 * WEP is decrypted by hardware. Clear WEP bit
blob - e9ff551127c7e937b52f99577ade8bc5dde91763
file + sys/dev/ic/ar5008.c
--- sys/dev/ic/ar5008.c
+++ sys/dev/ic/ar5008.c
@@ -1039,7 +1039,7 @@ ar5008_rx_process(struct athn_softc *sc, struct mbuf_l
m_adj(m, -IEEE80211_CRC_LEN);
 
/* Send the frame to the 802.11 layer. */
-   rxi.rxi_flags = 0;  /* XXX */
+   memset(, 0, sizeof(rxi));
rxi.rxi_rssi = MS(ds->ds_status4, AR_RXS4_RSSI_COMBINED);
rxi.rxi_rssi += AR_DEFAULT_NOISE_FLOOR;
rxi.rxi_tstamp = ds->ds_status2;
blob - 5aa99be1508ee4b893a50adf89f8c83eaad9e94b
file + sys/dev/ic/ar9003.c
--- sys/dev/ic/ar9003.c
+++ sys/dev/ic/ar9003.c
@@ -1026,7 +1026,7 @@ ar9003_rx_process(struct athn_softc *sc, int qid, stru
m_adj(m, -IEEE80211_CRC_LEN);
 
/* Send the frame to the 802.11 layer. */
-   rxi.rxi_flags = 0;  /* XXX */
+   memset(, 0, sizeof(rxi));
rxi.rxi_rssi = MS(ds->ds_status5, AR_RXS5_RSSI_COMBINED);
rxi.rxi_tstamp = ds->ds_status3;
ieee80211_inputm(ifp, m, ni, , ml);
blob - 9a640630ecdd5e55125bae10cbd31665bf7ce9e6
file + sys/dev/ic/ath.c
--- sys/dev/ic/ath.c
+++ sys/dev/ic/ath.c
@@ -1936,7 +1936,7 @@ ath_rx_proc(void *arg, int npending)
 #endif
m_adj(m, -IEEE80211_CRC_LEN);
wh = mtod(m, struct ieee80211_frame *);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
if (!ath_softcrypto && (wh->i_fc[1] & IEEE80211_FC1_WEP)) {
/*
 * WEP is decrypted by hardware. Clear WEP bit
blob - 49b4089c69fe17b0215d514d0b3b1215a87b69b8
file + sys/dev/ic/atw.c
--- sys/dev/ic/atw.c
+++ sys/dev/ic/atw.c
@@ -3175,7 +3175,7 @@ atw_rxintr(struct atw_softc *sc)
 
wh = mtod(m, struct ieee80211_frame *);
ni = ieee80211_find_rxnode(ic, wh);
-   rxi.rxi_flags = 0;
+   memset(, 0, sizeof(rxi));
 #if 0
if (atw_hw_decrypted(sc, wh)) {
wh->i_fc[1] &= ~IEEE80211_FC1_WEP;
@@ -3183,7 +3183,6 @@ atw_rxintr(struct atw_softc *sc)
}
 #endif
rxi.rxi_rssi = (int)rssi;
-   rxi.rxi_tstamp = 0;
ieee80211_inputm(ifp, m, ni, , );
/*
 * The frame may have caused the node to be marked for
blob - c245bcdac6953e6e88059fb1fccdc319b412ccfa
file + sys/dev/ic/bwfm.c
--- sys/dev/ic/bwfm.c
+++ sys/dev/ic/bwfm.c
@@ -2439,9 +2439,7 @@ bwfm_rx_auth_ind(struct bwfm_softc *sc, struct bwfm_ev
 
/* Finalize mbuf. */
m->m_pkthdr.len = m->m_len = pktlen;
-   rxi.rxi_flags = 0;
-   rxi.rxi_rssi = 0;
-   rxi.rxi_tstamp = 0;
+   memset(, 0, sizeof(rxi));
ieee80211_input(ifp, m, ic->ic_bss, );
 }
 
@@ -2495,9 +2493,7 @@ bwfm_rx_assoc_ind(struct bwfm_softc *sc, struct bwfm_e
m_freem(m);
return;
}
-   rxi.rxi_flags = 0;
-   rxi.rxi_rssi = 0;
-   rxi.rxi_tstamp = 0;
+   

Re: Ralink RT2790 - connection problems

2022-04-21 Thread Stefan Sperling
On Thu, Apr 21, 2022 at 08:58:48PM +0200, Sven Wolf wrote:
> But when I build a new kernel with the sources from 2022-03-15 everything is
> fine.
> 
> Maybe this commit causes this behaviour/bug
> https://marc.info/?l=openbsd-cvs=16472610775=2

Looks like a bug in ral(4) (uninitialized variable) which has been
exposed by the above commit. Does the patch below help?

If it does then I will do a sweep of all wifi drivers for similar problems.
Sorry for not catching this earlier. This could have been caught had it
occurred to me to check for any uninitialized use of this struct when
I added a new field...

diff a26af1db5d30d7a58f91742886569d0d8891b827 /usr/src
blob - 3178226c0b633534b065088e426e80b5a26853c9
file + sys/dev/ic/rt2860.c
--- sys/dev/ic/rt2860.c
+++ sys/dev/ic/rt2860.c
@@ -1275,6 +1275,8 @@ rt2860_rx_intr(struct rt2860_softc *sc)
uint16_t phy;
 #endif
 
+   memset(, 0, sizeof(rxi));
+
hw = RAL_READ(sc, RT2860_FS_DRX_IDX) & 0xfff;
while (sc->rxq.cur != hw) {
struct rt2860_rx_data *data = >rxq.data[sc->rxq.cur];



Re: Install bug: could not read firmware iwm-7765D-29 (error 2)

2022-04-05 Thread Stefan Sperling
On Tue, Apr 05, 2022 at 10:29:17AM +, open...@ploum.eu wrote:
> Trying to install OpenBSD 7.0 from USB key on the following laptop :
> 
> Starlabs Star Lite MkII.
> 
> 
> The install refuse to load the wifi driver with the following error :

Theo already answered this part of your question.

> It looks like OpenBSD thinks the network wifi card is an AC 7265D instead of
> an AC 3165.

There is no separate firmware image for 3165 devices.
The 3165 is the same device as 7265, using only one antenna instead of two.



Re: iwx(4): Device timeouts when connected to 802.11ac

2022-04-05 Thread Stefan Sperling
On Mon, Apr 04, 2022 at 09:58:09PM -0400, Ashton Fagg wrote:
> >Synopsis:iwx(4) device timeouts on 802.11ac networks
> >Category:Bug/driver issue
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1 (GENERIC.MP) #458: Sun Apr  3 23:10:53 MDT 
> 2022
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   I have noticed this only in this current snapshot, and the one I
>   was running previously (which was from Friday or Saturday). I
>   have been running other snapshots since 802.11ac support got
>   added with no issues. (Thank you to stsp, it's been working
>   great hitherto).
> 
>   My thinkpad t14s has intel wifi hardware that attaches to
>   iwx(4). I have experienced "stalls", where the wifi hardware
>   appears to just stop passing packets. Eventually connectivity
>   returns (sometimes 30 secs or so later).
> 
>   I flipped the debug bit for iwx(4) and managed to determine that
>   this is actually due to device timeout and reset. Prior to full
>   dmesg below, I have included the relevant output.
> 
>   There have been no changes to my system (aside from new
>   snapshots). Additionally, no changes to my network infra.
> 
> >How-To-Repeat:
>   As best I can tell, connecting to an 802.11ac access point and
>   keeping the wifi busy will surely do it. I've experienced this
>   frequently with having an ssh session open (with a build
>   running, so tonnes of output scrolling past), and having youtube
>   playing music going at the same time.
> >Fix:
>   Waiting a couple of seconds while the card resets itself is
>   enough to get things moving again.
> 
> 
> Please let me know if there's anything else I can provide to assist or
> if you'd like my help testing a potential fix. Thank you.

There is nothing actionable in your report, though it is good to know
that there seems to be an issue which the driver could handle better.

It is unclear to me why the device would suddenly stop generating
interrupts, which is what leads to a "device timeout". Generally, this
implies a problem that triggers at firmware, hardware, or RF level.

It would be good to know if your AP did anything extraordinary at the time.
Hopefully that would provide more context and lead to clues.

Did your AP switch channels, perhaps?

Or did the AP switch its channel width?
You could record beacons with tcpdump and look for differences in vhtop
information where the channel width is encoded:
  tcpdump -n -i iwx0 -y IEEE802_11_RADIO -s 4096 -v -D in type mgt subtype 
beacon
Channel width info should show up like this:
  vhtop=<80MHz chan,center chan 122,

Those are just shots in the dark though, the driver is already expected
to cope when such changes occur.

When the AP simply disappears from the air as a result of switching channels,
a small stall followed by a reconnect as you describe is expected. We do not
yet honor channel switch announcements (they are not authenticated and
therefore could be abused; something to revisit once support for protected
management frames gets implemented). The exact error condition shown in debug
output will depend on what the driver was doing at the moment. If there are
frames on Tx queues, I believe a device timeout is possible, though not pretty.

Maybe the reason is something else entirely and this issue will not have
anything related happening on the AP side.

I have been using iwx on my desktop, always on, on 11ac, without issues,
for weeks. At least for me, it has been quite stable. Certainly no less stable
than this driver ever has been. It runs for weeks without firmware errors
happening, which was not the case a year or two ago.



Re: Kernel page fault in iwm (related to suspend?)

2022-03-21 Thread Stefan Sperling
On Sun, Mar 20, 2022 at 09:57:03PM +0100, Sven M. Hallberg wrote:
> >Synopsis:Kernel page fault in iwm (related to suspend?)
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1-beta (GENERIC.MP) #422: Tue Mar 15 11:28:22 
> MDT 2022
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   A few minutes into regular use after resume, the kernel panicked with
>   a page fault. The machine is a Thinkpad X250, running a recent snapshol.
>   I was using X with two xterms open (running radare2 and vim). The wifi
>   interface was enabled but not connected to any network, as far as I am
>   aware. I was riding a public bus at the time, so any kind of network
>   could have been passing by. I recall similar crashes in the past and it
>   *seems* that they like to happen after a suspend (close lid) and resume,
>   though suspend usually works fine.
> 
>   Screenshots of ddb output are attached.

The important bit from the screenshots is:

Stopped at  _rb_insert+0x27d:   cmpq%r12,0x8(%rcx)

We get to _rb_insert via ieee80211_setup_node() when a beacon is received
and we want to create a new entry for this node in the nodes tree.

Reading objdump -S output, to me it looks like the crash happens here:

static inline void
rbe_insert_color(const struct rb_type *t, struct rb_tree *rbt,
struct rb_entry *rbe)
{
struct rb_entry *parent, *gparent, *tmp;

while ((parent = RBE_PARENT(rbe)) != NULL &&
RBE_COLOR(parent) == RB_RED) {
gparent = RBE_PARENT(parent);

if (parent == RBE_LEFT(gparent)) {   <-- crash

If my assumption is correct, the failing memory access would be
gparent->rbt_left which matches offset 0x8 of the failing instruction.

It seems the RB tree is in an invalid state. If the parent is red then
there should be a grand-parent. I don't know how this can happen.
Does anyone have an idea?



Re: WLAN 802.1x authentication problems

2022-01-21 Thread Stefan Sperling
On Fri, Jan 21, 2022 at 08:26:18AM +0100, Christian Ehrhardt wrote:
> 
> Hi,
> 
> On Fri, Jan 21, 2022 at 12:00:54AM -0700, Theo de Raadt wrote:
> > > In addition, I checked the code of dhclient and while it does listen
> > > to the RTM_80211INFO messages it will only use them to do a RESTART (drop
> > > things it thinks it knows about the interface and re-read all the
> > > information) which is a good thing. It will not start sending
> > > messages early while the link is still down.
> > 
> > dhclient is being used by fewer and fewer people.  Please focus on the
> > other route socket listeners, there are many.
> 
> I did grep all of "src" before and added all the patches in
> "ports", now. I'm grepping for the RTM_80211INFO constant as
> the change only affects when this particular message type is sent.
> Any user of the message would have to use this constant somehow, right?
> 
> Apart from wpa_supplicant the only users are:
> - dhclient which is fine.
> - route prints the message contents in monitor mode but does
>   not use the message otherwise.

Right. I took another look at this and I agree with your analysis now.

With dhclient being phased out, wpa_suppplicant is the only active
user of this routing message. Any potential future users can simply
adapt to the new behaviour your patch introduces; the only downside
I see is that we might generate redundant messages. Which could be
handled by userland, or could be eliminated by simply not sending
this message when link goes up.

The original purpose of this message was indeed to help dhclient.
Because dhclient identifies DHCP servers based on IP address, any
two wifi networks using the same IP subnet would collide in the
lease database without an extra key such as the SSID.

The link-state issues I recalled was in fact a different problem:
There were drivers which allowed link to go UP before a WPA key was
installed in the device. This has been fixed to make dhclient happy.

Our the new DHCP daemon does not use this messsage (yet?), it only
looks for link state going UP. I don't know if this is a problem
that should be fixed or if the existing code works well enough.
In any case, new code to handle the message would still need to be
written for that daemon.

Around the same time dhclient was patched, wpa_supplicant was patched
in ports to handle this message as well. Roaming with wpa_supplicant
was tested at a ccc congress and it was apparently working at the time.
I was not involved in these wpa_supplicant changes; I do not use it,
and I do not remember being involved much, if at all, in the testing
that was being done. I definitely did not run wpa_supplicant myself.
The difference between successful roaming and failed roaming followed by
a new association is just a relatively small dip in the availability of
link. This could easily be missed during casual testing at a conference.

Looking at the kernel and wpa_supplicant code, what you are describing
does not look like a new problem to me. The 80211 info routing message
has always been sent when link state becomes UP, which only happens once
the WPA handshake completes. In case of 802.1x the handshake happens when
wpa_supplicant's ioctl triggers ieee80211_keyrun(). I don't see how this
could ever have worked, except by accident as you described: Userland's
attempt to associate to BSSID zero (or any previous BSSID) might complete
a kernel-side handshake against the correct BSSID successfully, at which
point wpa_supplicant gets notified and now knows the correct BSSID.
Which is of course entirely backwards.



Re: WLAN 802.1x authentication problems

2022-01-20 Thread Stefan Sperling
On Thu, Jan 20, 2022 at 05:13:22PM +0100, Christian Ehrhardt wrote:
> 
> Hi,
> 
> The wpa supplicant (tries to) use routing messages of type
> RTM_80211INFO to detect WLAN access point changes. If the
> SSID or the BSSID changes compared to the last RTM_80211INFO
> message an ASSOC event is injected into the 802.1x state
> machines.
> 
> The problem with this approach is that
> the kernel only generates these events when the WLAN link
> state changes to "UP". In most cases link UP is basically the
> same as the IEEE80211_S_RUN which means that the STA associated
> to an AP and the AP confirmed the association.
> 
> However, in the case of 802.1x the flag IEEE80211_F_RSNON
> is set which delays the link up event until after the 802.1x
> handshake is complete.
> 
> In the roaming case this means that the WPA supplicant
> does not notice that it was DEAUTH'ed from an access point
> that we are roaming away from. As a result it will drop EAPOL
> packets from the new AP an association with the new AP will fail.
> 
> In the case of an initial connection wpa_supplicant will
> try to associate with the BSSID 00:00:00:00:00:00 until this
> attempt times out. Depending on the exact timing a subsequent
> attempt to associate with the correct BSSID may or may not be
> started.
> 
> To fix this send the RTM_80211INFO if we reach the
> IEEE80211_S_RUN state even if this does not result in a link
> up event. In the 802.1x case this will result in two of these
> messages with the same SSID and BSSID. The first event when we
> reach the IEEE80211_S_RUN state (before the 802.1x authentication)
> and a second message once the interface is actually up.
> 
> A quick grep into userland suggest that this additional message
> is not a problem and it fixes 802.1x.

The current behaviour was introduced intentionally in order to
unbreak problems with early link-up events and DHCP clients.

If link goes up before the WPA handshake has completed, DHCP clients try
to send packets too early and can run into DHCP-protocol level timeouts.
A lot of work went into fixing this, so simply undoing all of that work
by sending the routing before packets can be sent does not seem good.

Can this not be fixed in another way? Would generating routing messages
for both link down+up be enough to fix 802.1x?

Could we generate a routing message to announce that roaming has succeeded
and make wpa_supplicant listen for this?

> diff --git a/sys/net80211/ieee80211_proto.c b/sys/net80211/ieee80211_proto.c
> index accec8d74d..a1a99e5024 100644
> --- a/sys/net80211/ieee80211_proto.c
> +++ b/sys/net80211/ieee80211_proto.c
> @@ -1307,6 +1307,8 @@ justcleanup:
>*/
>   ieee80211_set_link_state(ic, LINK_STATE_UP);
>   ni->ni_assoc_fail = 0;
> + } else {
> + task_add(systq, >ic_rtm_80211info_task);
>   }
>   ic->ic_mgt_timer = 0;
>   ieee80211_set_beacon_miss_threshold(ic);
> @@ -1322,11 +1324,10 @@ void
>  ieee80211_rtm_80211info_task(void *arg)
>  {
>   struct ieee80211com *ic = arg;
> - struct ifnet *ifp = >ic_if;
>   struct if_ieee80211_data ifie;
>   int s = splnet();
>  
> - if (LINK_STATE_IS_UP(ifp->if_link_state)) {
> + if (ic->ic_state == IEEE80211_S_RUN) {
>   memset(, 0, sizeof(ifie));
>   ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
>   memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,




Re: ThinkPad X1 Carbon gen9 - suspend and hibernate not working

2022-01-16 Thread Stefan Sperling
On Sat, Jan 15, 2022 at 09:00:44PM +, Mikolaj Kucharski wrote:
> I'm wondering what is recommended setup for swap with full disk
> encryption. Because swap is encrypted by default I decided to put
> it outside of softraid crypto - sd0b, where OpenBSD is on sd1a and
> I guess kernel expects swap to be on sd1b:

Yes, the kernel expects to find the default swap partition on the disk
which also contains the root filesystem. I would not be surprised if
the boot loader looks for swap on the disk backed by the softraid crypto
volume and fails to unhibernate if no swap can be found there. This keeps
things simple in the boot loader code.

These are two different layers of encryption and it does not hurt at all
to use both. Performance differences are negligible on modern laptops,
especially if your machine has AES in CPU flags. Laptops usually have
enough ram nowadays to avoid much swapping in the first place. They will
likely only use this swap partition to support edge cases and hibernate.



Re: ThinkPad X1 Carbon gen9 - suspend and hibernate not working

2022-01-15 Thread Stefan Sperling
On Sat, Jan 15, 2022 at 10:03:07AM -0700, Theo de Raadt wrote:
> Additionally, the /boot code must be able to see that the disk contains
> a hibernate signature, and this may not work if your swap partition
> is at such a far into the disk.
> 
> #size   offset  fstype [fsize bsize   cpg]
>   a:  1875.7G 1024RAID
>   b:32.0G   3933688463swap# none
> 

This looks like a CRYPTO softraid volume and a separate swap partition.
There is probably no swap partition configured inside softraid?

Moving swap inside the softraid volume (as the installer would create
by default) should be better. I have just successfully tested hibernate
with such a setup on one my laptops, running -current and using EFI boot.
Where the outer disklabel looks like this:

#size   offset  fstype [fsize bsize   cpg]
  c:5001181920  unused
  d:500117105 1024RAID
  i:  960   64   MSDOS



Re: Acer Swift1 (SF114-34, N6000, Jasper Lake): iwx (ax201), azalia and emmc are not working/detected

2022-01-12 Thread Stefan Sperling
On Thu, Jan 13, 2022 at 07:14:28AM +0100, Sven Wolf wrote:
> 
> 
> On 1/12/22 21:07, Stefan Sperling wrote:
> > On Wed, Jan 12, 2022 at 08:58:28PM +0100, Sven Wolf wrote:
> > > With following patches I got also the wireless nic working.
> > 
> > Nice! I am glad to see support for additional iwx devices.
> > 
> > Could you try this patch by Iraklis Karagkiozoglou please?
> > https://marc.info/?l=openbsd-tech=164194046522312=2
> > I plan to get to this patch soon. I still want to get some other patches
> > out of my queue before reviewing it. But knowing whether it works for you
> > would be good. See here for reasons why:
> > https://marc.info/?l=openbsd-tech=164175499605919=2
> 
> The patch by Iraklis Karagkiozoglou doesn't work with my device :(
> Now I get again:
> iwx0 at pci0 dev 20 function 3 "Intel Wi-Fi 6 AX201" rev 0x01, msix
> iwx0: unknown adapter type

OK, good to know. Thank you.
We will figure this out eventually, don't worry.



Re: Acer Swift1 (SF114-34, N6000, Jasper Lake): iwx (ax201), azalia and emmc are not working/detected

2022-01-12 Thread Stefan Sperling
On Wed, Jan 12, 2022 at 08:58:28PM +0100, Sven Wolf wrote:
> With following patches I got also the wireless nic working.

Nice! I am glad to see support for additional iwx devices.

Could you try this patch by Iraklis Karagkiozoglou please?
https://marc.info/?l=openbsd-tech=164194046522312=2
I plan to get to this patch soon. I still want to get some other patches
out of my queue before reviewing it. But knowing whether it works for you
would be good. See here for reasons why:
https://marc.info/?l=openbsd-tech=164175499605919=2



Re: wireguard-related mbuf panic (was: Re: panic: ieee80211_has_seq(wh) assertion failed)

2022-01-12 Thread Stefan Sperling
On Wed, Jan 12, 2022 at 10:50:50AM +0100, Christian Ehrhardt wrote:
> diff --git a/sys/kern/uipc_mbuf.c b/sys/kern/uipc_mbuf.c
> index 5e4cb5ba88..d8b9e751c6 100644
> --- a/sys/kern/uipc_mbuf.c
> +++ b/sys/kern/uipc_mbuf.c
> @@ -957,8 +957,6 @@ m_pullup(struct mbuf *m0, int len)
>  
>   head = M_DATABUF(m0);
>   if (m0->m_len == 0) {
> - m0->m_data = head;
> -
>   while (m->m_len == 0) {
>   m = m_free(m);
>   if (m == NULL)
> @@ -972,25 +970,29 @@ m_pullup(struct mbuf *m0, int len)
>   tail = head + M_SIZE(m0);
>   head += adj;
>  
> - if (len <= tail - head) {
> - /* there's enough space in the first mbuf */
> -
> - if (len > tail - mtod(m0, caddr_t)) {
> + if (!M_READONLY(m0) && len <= tail - head) {
> + /* we can copy everything into the first mbuf */
> + if (m0->m_len == 0) {
> + m0->m_data = head;

Before your patch, m_data was effectively set to head - adj in this case,
because head += adj had not happened yet. Does this fix a bug in the old
code, or does it introduce a bug?

The rest makes sense to me, thanks for putting so much effort into getting
this fixed!

OK by me, but in my opinion an mbuf expert (bluhm? claudio? someone else?)
needs to take a close look at this before it goes in.

> + } else if (len > tail - mtod(m0, caddr_t)) {
>   /* need to memmove to make space at the end */
>   memmove(head, mtod(m0, caddr_t), m0->m_len);
>   m0->m_data = head;
> + len -= m0->m_len;
>   }
> -
> - len -= m0->m_len;
>   } else {
> - /* the first mbuf is too small so make a new one */
> + /* the first mbuf is too small or read-only, make a new one */
>   space = adj + len;
>  
>   if (space > MAXMCLBYTES)
>   goto bad;
>  
> - m0->m_next = m;
> - m = m0;
> + if (m0->m_len == 0) {
> + m_free(m0);
> + } else {
> + m0->m_next = m;
> + m = m0;
> + }
>  
>   MGET(m0, M_DONTWAIT, m->m_type);
>   if (m0 == NULL)




Re: Repeated system hiccups when iwn(4) is used

2022-01-07 Thread Stefan Sperling
On Fri, Jan 07, 2022 at 06:10:09PM +0100, Alessandro De Laurenzis wrote:
> Greetings,
> 
> This is one of my hardest reports, since both the symptoms and the overall
> context are rather obscure...
> 
> On a Thinkpad T430 (complete dmesg in attachment) equipped with a Intel
> Centrino Ultimate-N 6300 wireless card I'm experiencing periodical hiccups
> (meaning that the system seems to freeze for a few tens of milliseconds,
> this is particularly evident when moving the mouse, typing or watching a
> video); it seems that the culprit is the wifi interface (i.e. if I switch it
> off or simply force it down using ifconfig, the problem goes away).
> 
> In 7.0 I see a lot of "iwn0: Fatal firmware error" messages, and this is
> what is reported after enabling the debug mode:

> > iwn0: fatal firmware error
> > firmware error log:
> >   error type  = "SYSASSERT" (0x0005)
> >   program counter = 0x2A9C
> >   source line = 0x02C1
> >   error data  = 0x
> >   branch link = 0x2A762A76
> >   interrupt link  = 0x1532
> >   time= 165

These firmware crash reports cannot be made sense of without Intel
proprietary information which we do not have access to.
We will need to figure out which behaviour of the driver triggers
this and then work our way out from there.
So it would help to see more context from debug mode, such as whether
state transitions ocurred (SCAN -> AUTH, and such) right before the
crash. These messages also appear in /var/log/messages where they have
timestamps which might help with correlating events.

We might have to add some debugging code to the driver to obtain
more information about this issue.
But please send a complete debug log first.

Cheers,
Stefan



Re: wifi ax 201, iwx not working on thinkpad e15 gen2 intel model

2022-01-07 Thread Stefan Sperling
On Fri, Jan 07, 2022 at 08:57:54AM +, Anant Pande wrote:
> In particular, regarding the wifi, the dmesg produced ‘  "Intel Wi-Fi 6 
> AX201" rev 0x20 at pci0 dev 20 function 3 not configured ‘, which is what I 
> got when I had installed OpenBSD release and snapshot.
> 

>  0:20:3: Intel Wi-Fi 6 AX201
>   0x: Vendor ID: 8086, Product ID: a0f0

Can you build and test a kernel which includes the patch below and see if
this makes your device work?

It seems our driver is a bit too conservative about matching devices with
product ID a0f0. We filter AX201 devices by subsystem ID, but looking again
at what the Linux driver does, a0f0 products with any subsystem ID should
work.

diff 6bbb73995c8e71f4ccedb27558b1e4887fc58c34 /usr/src
blob - 197efe2a82347d27bf30ac1d0a789bbc25820db8
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -9215,8 +9215,9 @@ iwx_match(struct device *parent, iwx_match_t match __u
switch (PCI_PRODUCT(pa->pa_id)) {
case PCI_PRODUCT_INTEL_WL_22500_1: /* AX200 */
return 1; /* match any device */
-   case PCI_PRODUCT_INTEL_WL_22500_2: /* AX201 */
case PCI_PRODUCT_INTEL_WL_22500_3: /* AX201 */
+   return 1; /* match any device */
+   case PCI_PRODUCT_INTEL_WL_22500_2: /* AX201 */
case PCI_PRODUCT_INTEL_WL_22500_4: /* AX201 */
case PCI_PRODUCT_INTEL_WL_22500_5: /* AX201 */
for (i = 0; i < nitems(iwx_subsystem_id_ax201); i++) {
@@ -9418,8 +9419,13 @@ iwx_attach(struct device *parent, struct device *self,
sc->sc_device_family = IWX_DEVICE_FAMILY_22000;
sc->sc_integrated = 1;
sc->sc_ltr_delay = IWX_SOC_FLAGS_LTR_APPLY_DELAY_200;
-   sc->sc_low_latency_xtal = 0;
-   sc->sc_xtal_latency = 500;
+   if (PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_INTEL_WL_22500_3) {
+   sc->sc_xtal_latency = 12000;
+   sc->sc_low_latency_xtal = 1;
+   } else {
+   sc->sc_low_latency_xtal = 0;
+   sc->sc_xtal_latency = 500;
+   }
sc->sc_tx_with_siso_diversity = 0;
sc->sc_uhb_supported = 0;
break;



Re: wifi ax 201, iwx not working on thinkpad e15 gen2 intel model

2022-01-06 Thread Stefan Sperling
On Thu, Jan 06, 2022 at 06:20:11PM +0100, Peter Nicolai Mathias Hansteen wrote:
> 
> 
> > 6. jan. 2022 kl. 18:00 skrev Anant Pande :
> > 
> > Post installation after fw_update ( which did not automatically install the 
> > iwx driver, so had to manually do ‘fw_update iwx’), the wifi is not 
> > working, that is no wireless interface shows up in ifconfig command output.
> > I tried this on Openbsd 7.0 release and snapshots both.
> 
> The driver is in the installed kernel, but the firmware that it depends on is 
> not installed unless you connect the machine to a working network device 
> during install. If you install from a mirror over wired ethernet (if 
> available) or one of the wifi things that are supported with distributable 
> firmware, it is much easier for everyone involved
> 

There are a few ax201 devices with device IDs we do not match on yet.
If the firmware is not fetched automatically this indicates that
this particular device shows up as "not configured" in dmesg rather
than an iwx0 device.

> Please use sendbug(8) — if your machine does not have usable networking, then 
> run sendbug -P >some_file.txt and forward some_file.txt suitably edited to 
> include a problem description to bugs@ from somewhere that is able to send 
> mail.
> 
> If you supply sufficient information, the relevant developers will likely 
> offer useful insights.
> 

Yes, in particular pcidump information produced by sendbug would be useful.



Re: 7.0 install from install70.img no firmware for wifi (iwn-5000)

2022-01-05 Thread Stefan Sperling
On Wed, Jan 05, 2022 at 03:11:47PM +0100, Marek Kozlowski wrote:
> :-)
> 
> On 1/5/22 11:42, Crystal Kolipe wrote:
> > On Wed, Jan 05, 2022 at 09:12:48AM +0100, Marek Kozlowski wrote:
> > > After I set up the ESSID and WPA-PSK the following message is displayed:
> > > 
> > > iwn0: could not read firmware iwn-5000 (error 2)
> > 
> > Your NIC requires firmware files which are available from:
> > 
> > http://firmware.openbsd.org/firmware/
> > 
> > It's probably easiest to complete the install without setting it up, then
> > install the firmware and configure the wireless adaptor manually afterwards.
> > 
> > The iwn(4), and fw_update(1) manual pages have more relevant information.
> 
> 1. Is it (missing firmware) intentional or is it a bug to be fixed?

Intel wifi firmware cannot be distributed with the official installation
media due to licensing conflicts. As a workaround, these firmware images
are not distributed by OpenBSD, they are distributed by volunteers.

Use the Ethernet interface (em0) during your install.
Once you boot up the installed system firmware will be downloaded
over Ethernet from one of the volunteer firmware mirrors.
Once that has been done, wifi on iwn0 will work.



Re: Wifi doesn't work on Samsung NP530XBB-AD2BR notebook

2021-12-19 Thread Stefan Sperling
On Sun, Dec 19, 2021 at 12:20:18PM -0300, João Victor wrote:
> >Synopsis:  
> >Category:  
> >Environment:
> System  : OpenBSD 7.0
> Details : OpenBSD 7.0 (GENERIC.MP) #232: Thu Sep 30 14:25:29
> MDT 2021
>  dera...@amd64.openbsd.org:
> /usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine : amd64
> >Description:
>  "Intel Gemini Lake CNVi" rev 0x03 at pci0 dev 12 function 0 not
> configured>

This device is not yet supported.

Here is patch. It is a complete shot in the dark, but worth trying.
The device name string can be fixed up later. What is important for
now is that the driver attaches and produces a working interface.


diff 6bbb73995c8e71f4ccedb27558b1e4887fc58c34 /usr/src
blob - ae02ad71669e77dae57e2df7d89346f5296b0f26
file + sys/dev/pci/if_iwm.c
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -847,7 +847,7 @@ iwm_read_firmware(struct iwm_softc *sc)
err = EINVAL;
goto parse_out;
}
-   sc->sc_fw_phy_config = le32toh(*(uint32_t *)tlv_data);
+   sc->sc_fw_phy_config |= le32toh(*(uint32_t *)tlv_data);
break;
 
case IWM_UCODE_TLV_API_CHANGES_SET: {
@@ -11177,6 +11177,7 @@ static const struct pci_matchid iwm_devices[] = {
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_WL_9260_1 },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_WL_9560_1 },
{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_WL_9560_2 },
+   { PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_GLK_WL },
 };
 
 int
@@ -11405,6 +11406,7 @@ iwm_attach(struct device *parent, struct device *self,
break;
case PCI_PRODUCT_INTEL_WL_9560_1:
case PCI_PRODUCT_INTEL_WL_9560_2:
+   case PCI_PRODUCT_INTEL_GLK_WL:
sc->sc_fwname = "iwm-9000-46";
sc->host_interrupt_operation_mode = 0;
sc->sc_device_family = IWM_DEVICE_FAMILY_9000;
@@ -11412,7 +11414,11 @@ iwm_attach(struct device *parent, struct device *self,
sc->sc_nvm_max_section_size = 32768;
sc->sc_mqrx_supported = 1;
sc->sc_integrated = 1;
-   sc->sc_xtal_latency = 650;
+   if (PCI_PRODUCT(pa->pa_id) == PCI_PRODUCT_INTEL_GLK_WL) {
+   sc->sc_xtal_latency = 670;
+   sc->sc_fw_phy_config = IWM_FW_PHY_CFG_SHARED_CLK;
+   } else
+   sc->sc_xtal_latency = 650;
break;
default:
printf("%s: unknown adapter type\n", DEVNAME(sc));
blob - 59ae267a763600b0893079abfbf5d13d6c5678d0
file + sys/dev/pci/if_iwmreg.h
--- sys/dev/pci/if_iwmreg.h
+++ sys/dev/pci/if_iwmreg.h
@@ -982,6 +982,7 @@ struct iwm_tlv_calib_ctrl {
 #define IWM_FW_PHY_CFG_TX_CHAIN(0xf << 
IWM_FW_PHY_CFG_TX_CHAIN_POS)
 #define IWM_FW_PHY_CFG_RX_CHAIN_POS20
 #define IWM_FW_PHY_CFG_RX_CHAIN(0xf << 
IWM_FW_PHY_CFG_RX_CHAIN_POS)
+#define IWM_FW_PHY_CFG_SHARED_CLK  (1U << 31)
 
 #define IWM_UCODE_MAX_CS   1
 




Re: SunBlade 100: X is very yellow with XVR-100 (radeon r100)

2021-12-12 Thread Stefan Sperling
On Sat, Dec 11, 2021 at 06:28:53PM -0700, Ted Bullock wrote:
> On 2021-12-11 5:39 a.m., Mark Kettenis wrote:
> >> Date: Sat, 11 Dec 2021 05:10:41 -0700
> >> From: Ted Bullock 
> > 
> > The are several reasons why that test can fail though.  It can be an
> > endian-ness issue or on sparc64 it could also be an IOMMU issue where
> > the wrong address is programmed into the hardware because CPU
> > addresses aren't properly translated into device virtual addresses.
> > 
> 
> Trying to figure out what the hell is happening here is making my eyes
> bleed a little...  there are lots of preprocessor stuff in this code
> that looks fragile to me. I've not written much in the last few years
> but surely this isn't a normal way of programming or maybe the authors
> are smarter than me. :( Anyway I'm looking for where things could get
> broken.
> 
> >> sys/dev/pci/drm/radeon/r100.c:3651
> >> WREG32(scratch, 0xCAFEDEAD);
> 
> Starting here this is a macro that calls an inline function:
> #define WREG32(reg, v) r100_mm_wreg(rdev, (reg), (v), false)
> 
> fwiw r100_mm_wreg is called only by one other thing, the macro:
> #define WREG32_IDX(reg, v) r100_mm_wreg(rdev, (reg), (v), true)
> 
> I don't know why they wrapped an inline function that is called in only
> 2 different ways behind a macro but they did so ok, then looking at
> r100_mm_wreg:
> 
> static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, 
> uint32_t v,
>   bool always_indirect)
> {
>   if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && 
> !always_indirect)
>   writel(v, ((void __iomem *)rdev->rmmio) + reg);
>   else
>   r100_mm_wreg_slow(rdev, reg, v);
> }
> 
> This has some pointer math but this doesn't look like it has anything to
> cause endian issues, so I suppose it's fine.
> 
> >> r = radeon_ring_lock(rdev, ring, 2);
> >> if (r) {
> >>DRM_ERROR("radeon: cp failed to lock ring (%d).\n", r);
> >>radeon_scratch_free(rdev, scratch);
> >>return r;
> >> }
> 
> This locking stuff has nothing that looks problematic to me for endian
> issues.
> 
> >> radeon_ring_write(ring, PACKET0(scratch, 0));
> 
> Until we get to this ^, this is kind of a nightmare for me to grasp.
> 
> The driver uses ring, or circular queue data structures as I'm familiar
> with for interacting with the gpu, work items are written to the queue
> and read by the gpu. The queue code doesn't look like it should have
> endian issues and obviously it's working on other platforms so can
> probably ignore it. It's weird to me that linux, bsd et al have circular
> queue data structures pre-rolled so I don't know why they made their
> own, perhaps ignorance or hubris, or they are super smart?
> 
> PACKET0(scratch, 0) this is kind of a monster though.
> 
> #define PACKET0(reg, n)   (CP_PACKET0 |   
> \
>REG_SET(PACKET0_BASE_INDEX, (reg) >> 2) |  \
>REG_SET(PACKET0_COUNT, (n)))
> and also here:
> 
> #define REG_SET(FIELD, v) (((v) << FIELD##_SHIFT) & FIELD##_MASK)
> 
> Think that maybe this is a big candidate for some sort of endian bug? I
> do. hmmm. This looks insanely fragile but maybe I'm crazy and probably a
> moron or something.
> 
> >> radeon_ring_write(ring, 0xDEADBEEF);

I notice that radeon_ring_write() takes a uint32_t argument.
When writing to memory which is shared with the device, such values need
to be byte-swapped for the device to read them in the expected byte-order.
And swapped back again in case such memory is read by the host.

If no attention was given to this by the original developers then you
will have a lot of fun trying to track down the places where byte swaps
are missing. Any multi-byte read/write access to data structures in memory
shared with the device (i.e. mapped for DMA) needs to do this.
You can look at virtually all the drivers in our tree for examples, and
read the htole32(3) man page for details.

I have not tested the following patch at all (not even compiled it).
And even if this patch is correct it will probably not suffice to make
everything work.
But fixes for missing byte-swaps should all be of this nature, assuming
the device expects little endian and your host is using big endian:

diff 1fa0b3b4477a96dd9841c14c78e338c6ab0abe1d /usr/src
blob - 4674299c6900dc3bfd32b29579f34986df2429b6
file + sys/dev/pci/drm/radeon/radeon.h
--- sys/dev/pci/drm/radeon/radeon.h
+++ sys/dev/pci/drm/radeon/radeon.h
@@ -2737,7 +2737,7 @@ static inline void radeon_ring_write(struct radeon_rin
if (ring->count_dw <= 0)
DRM_ERROR("radeon: writing more dwords to the ring than 
expected!\n");
 
-   ring->ring[ring->wptr++] = v;
+   ring->ring[ring->wptr++] = htole32(v);
ring->wptr &= ring->ptr_mask;
ring->count_dw--;
ring->ring_free_dw--;


> >> radeon_ring_unlock_commit(rdev, ring, false);
> >> for (i = 0; i < rdev->usec_timeout; i++) {
> >> 

Re: Fix for "panic: ieee80211_encrypt: key unset for sw crypto"

2021-12-07 Thread Stefan Sperling
On Tue, Dec 07, 2021 at 11:38:51AM +, Mikolaj Kucharski wrote:
> More than a week of uptime on both machines:
> 
> pce-0041# uptime
> 11:30AM  up 8 days, 18:48, 1 user, load averages: 0.02, 0.05, 0.01
> 
> pce-0035# uptime
> 11:31AM  up 8 days, 18:39, 1 user, load averages: 0.11, 0.14, 0.08
> 
> >From dmesg logs on my debugging kernel I see that, pce-0041 had zero
> codepaths triggered so far of newly introduced code, but on pce-0035 I
> see that new code path was triggered about 4 times.
> 
> I'm planning to keep that kernel version running on those for a month or
> maybe even more, but so far result looks very good. I think at this
> stage pce-0035 would already panic(), based on my historical stats, how
> often machine paniced before. Machine pce-0041 panics once per a quarter,
> so I would need to wait a bit more to have good level of confidence,
> against my stats.

I am confident that we have found the root cause of this issue.
I have committed the fix.
Thank you for your patience and all the help with tracking this down!



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-05 Thread Stefan Sperling
On Sun, Dec 05, 2021 at 11:05:32AM -0600, Scott Cheloha wrote:
> > diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
> > de98c050ea709bdb8e26be40ab0cc82ef9afed80
> > blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
> > blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
> > --- sys/net80211/ieee80211.c
> > +++ sys/net80211/ieee80211.c
> > @@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
> > if_addgroup(ifp, "wlan");
> > ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
> >  
> > +   task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
> > ieee80211_set_link_state(ic, LINK_STATE_DOWN);
> >  
> > timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
> > @@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
> >  {
> > struct ieee80211com *ic = (void *)ifp;
> >  
> > +   task_del(systq, >ic_rtm_80211info_task);
> > timeout_del(>ic_bgscan_timeout);
> 
> Suppose the ic_bgscan_timeout timeout is running at the moment we're
> running ieee80211_ifdetach().  Ignore the kernel lock for the moment,
> think about the future.

There are many more places in the wireless stack that do such things.
But I am not interested in doing MP work on wireless myself. That is
simply asking too much on top of everything else I do.

> If we delete the task before we delete the timeout and the timeout
> then adds the task back onto the task queue, what happens?
> 
> My guess is you need to ensure the timeout is no longer running
> *before* you delete the task.  Can you do timeout_del_barrier()
> here?  See the attached patch.

Yes, sure we can. There are dozens of other timeouts in the wireless
stack and drivers, so your patch is a small step on a very long road.

> > /*
> > blob - 447a2676bfb250b7f917206549679d6ae68de1f6
> > blob + 7e10fc1336067542c13d5607602e658ce2b3926b
> > --- sys/net80211/ieee80211_proto.c
> > +++ sys/net80211/ieee80211_proto.c
> > @@ -1288,6 +1288,31 @@ justcleanup:
> >  }
> >  
> >  void
> > +ieee80211_rtm_80211info_task(void *arg)
> > +{
> > +   struct ieee80211com *ic = arg;
> > +   struct ifnet *ifp = >ic_if;
> > +   struct if_ieee80211_data ifie;
> > +   int s = splnet();
> > +
> > +   if (LINK_STATE_IS_UP(ifp->if_link_state)) {
> 
> Does this mean userspace can "miss" state transitions if the task runs
> and the state has already changed back to not-up?
> 
> I don't know whether this would matter in practice, but it would be a
> behavior change.

If this task doesn't find link state up for some reason then it is
running in an unexpected situation. What else should it do?

I think it makes sense for a scheduled task to check that its precondition
still applies. We have had several bugs in iwm(4) where tasks did their
thing even though their reason for being scheduled no longer applied.
In that case we ended up sending firmware commands that appeared our
of order from the device's point of view. I don't know if this really
matters here, it will depend on what userland expects.
 
> Unsure how `route monitor` exercises this path, but I've left it
> running, too.

That was just to see whether the routing message is still being generated.



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 09:32:40PM +0300, Vitaliy Makkoveev wrote:
> I think rtm_80211info() could follow the if_link_state_change()
> way and use task for that.

Indeed. I did not realize that if_link_state_change() schedules a task.

This means ieee80211_set_link_state() is already deferring some of its
work to a task. The patch below defers sending the 80211info message
to a task as well.

I am keeping things simple for now and use systq (kernel-locked) instead
of copying the argument-passing approach used by if_link_state_change().

Tested on iwm(4) 8265 with 'route monitor'.

ok?

diff 0b61c8235787960f0010ef627ea5b2c6309a81f0 
de98c050ea709bdb8e26be40ab0cc82ef9afed80
blob - 7bb68194dd78417b06c59f81d1ebbff4165203d8
blob + 5b9a969258074fde29e21a33ac035cf170ec3b03
--- sys/net80211/ieee80211.c
+++ sys/net80211/ieee80211.c
@@ -193,6 +193,7 @@ ieee80211_ifattach(struct ifnet *ifp)
if_addgroup(ifp, "wlan");
ifp->if_priority = IF_WIRELESS_DEFAULT_PRIORITY;
 
+   task_set(>ic_rtm_80211info_task, ieee80211_rtm_80211info_task, ic);
ieee80211_set_link_state(ic, LINK_STATE_DOWN);
 
timeout_set(>ic_bgscan_timeout, ieee80211_bgscan_timeout, ifp);
@@ -203,6 +204,7 @@ ieee80211_ifdetach(struct ifnet *ifp)
 {
struct ieee80211com *ic = (void *)ifp;
 
+   task_del(systq, >ic_rtm_80211info_task);
timeout_del(>ic_bgscan_timeout);
 
/*
blob - 447a2676bfb250b7f917206549679d6ae68de1f6
blob + 7e10fc1336067542c13d5607602e658ce2b3926b
--- sys/net80211/ieee80211_proto.c
+++ sys/net80211/ieee80211_proto.c
@@ -1288,6 +1288,31 @@ justcleanup:
 }
 
 void
+ieee80211_rtm_80211info_task(void *arg)
+{
+   struct ieee80211com *ic = arg;
+   struct ifnet *ifp = >ic_if;
+   struct if_ieee80211_data ifie;
+   int s = splnet();
+
+   if (LINK_STATE_IS_UP(ifp->if_link_state)) {
+   memset(, 0, sizeof(ifie));
+   ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
+   memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
+   sizeof(ifie.ifie_nwid));
+   memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
+   sizeof(ifie.ifie_addr));
+   ifie.ifie_channel = ieee80211_chan2ieee(ic,
+   ic->ic_bss->ni_chan);
+   ifie.ifie_flags = ic->ic_flags;
+   ifie.ifie_xflags = ic->ic_xflags;
+   rtm_80211info(>ic_if, );
+   }
+
+   splx(s);
+}
+
+void
 ieee80211_set_link_state(struct ieee80211com *ic, int nstate)
 {
struct ifnet *ifp = >ic_if;
@@ -1307,20 +1332,8 @@ ieee80211_set_link_state(struct ieee80211com *ic, int 
}
if (nstate != ifp->if_link_state) {
ifp->if_link_state = nstate;
-   if (LINK_STATE_IS_UP(nstate)) {
-   struct if_ieee80211_data ifie;
-   memset(, 0, sizeof(ifie));
-   ifie.ifie_nwid_len = ic->ic_bss->ni_esslen;
-   memcpy(ifie.ifie_nwid, ic->ic_bss->ni_essid,
-   sizeof(ifie.ifie_nwid));
-   memcpy(ifie.ifie_addr, ic->ic_bss->ni_bssid,
-   sizeof(ifie.ifie_addr));
-   ifie.ifie_channel = ieee80211_chan2ieee(ic,
-   ic->ic_bss->ni_chan);
-   ifie.ifie_flags = ic->ic_flags;
-   ifie.ifie_xflags = ic->ic_xflags;
-   rtm_80211info(>ic_if, );
-   }
+   if (LINK_STATE_IS_UP(nstate))
+   task_add(systq, >ic_rtm_80211info_task);
if_link_state_change(ifp);
}
 }
blob - 7208e5dc0be1983a31d7f74142e87581bea95d13
blob + ce6d1ad91f391cb16c7f0bbfa79210401d5dc7eb
--- sys/net80211/ieee80211_proto.h
+++ sys/net80211/ieee80211_proto.h
@@ -63,6 +63,7 @@ externvoid ieee80211_proto_detach(struct ifnet *);
 struct ieee80211_node;
 struct ieee80211_rxinfo;
 struct ieee80211_rsnparams;
+extern void ieee80211_rtm_80211info_task(void *);
 extern void ieee80211_set_link_state(struct ieee80211com *, int);
 extern u_int ieee80211_get_hdrlen(const struct ieee80211_frame *);
 extern int ieee80211_classify(struct ieee80211com *, struct mbuf *);
blob - dd17ed76031db17bd86cd75a5c1eec659dbd3f30
blob + 641419331ed3f65fbc9272c055948438b28f1025
--- sys/net80211/ieee80211_var.h
+++ sys/net80211/ieee80211_var.h
@@ -306,6 +306,7 @@ struct ieee80211com {
struct timeout  ic_inact_timeout; /* node inactivity timeout */
struct timeout  ic_node_cache_timeout;
 #endif
+   struct task ic_rtm_80211info_task;
int ic_des_esslen;
u_int8_tic_des_essid[IEEE80211_NWID_LEN];
struct ieee80211_channel *ic_des_chan;  /* desired channel */




Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 07:19:23PM +0100, Stefan Sperling wrote:
> For this particular case, yes.
> But that won't solve ieee80211_set_link_state() being called from
> interrupt context, would it?

Below is a breakdown of all callers by context.
I grepped for direct callers. My search did not see indirect callers
such as the one from timeout context which we already know about.

I am worried about callers in interrupt context.
It will be hard to "fix" some of these. I would prefer if the routing
message layer could be made to work from interrupt context if possible.

Is there a function we could call to figure out which context we are
running in? If so, ieee80211_set_link_state() could skip sending the
routing message as a workaround. This means some drivers would lose
link state messages unless relevant driver code is migrated to a task.
Which could be done over time on demand, on a per-driver basis.

autoconf context (no userland yet so doesn't matter):
ieee80211_ifattach()

interrupt context (bad):
bwfm_newstate()
pgt_newstate()
ipw_newstate()
iwi_newstate()
iwn_newstate()
wpi_newstate()
atu_newstate()
ieee80211_recv_4way_msg3()
ieee80211_recv_rsn_group_msg1()
ieee80211_recv_wpa_group_msg1()
ieee80211_newstate() <<-- ALL DRIVERS go through here

task context (ok):
iwm_scan()
iwx_scan()
iwx_add_sta_key()
athn_usb_set_key_cb()
otus_set_key_cb()
rsu_set_key_cb()
run_set_key_cb()
urtwn_set_key_cb()



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 07:06:35PM +0100, Tobias Heider wrote:
> On Sat, Dec 04, 2021 at 06:50:54PM +0100, Stefan Sperling wrote:
> > On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
> > > Hit a witness panic during boot yesterday.  Can't repro, have never
> > > seen it before.  The photo is a mess (ask if you want it) but the
> > > backtrace is:
> > > 
> > > panic
> > > witness_checkorder
> > > rw_enter_write
> > > solock
> > > route input
> > > ieee80211_set_link_state
> > > ieee80211_recv_4way_msg3
> > > ieee80211_eapol_key_input
> > > ieee80211_decap
> > > ieee80211_input_ba_flush
> > > ieee80211_input_ba_gap_timeout
> > > timeout_run
> > > softclock_process_tick_timeout
> > > sotclock
> > > 
> > > We're not allowed to take sleeping locks during the execution of a
> > > timeout, hence the witness panic.
> > 
> > I was under the impression that timeouts and tasks are equivalent in this
> > respect. Do timeouts not use a process context which can use rw locks?
> > Was this never the case? Or did this change recently?
> > 
> > We could schedule a task from the BA gap timeout handler, there is
> > no problem with doing that.
> > 
> > However, there are many callers of ieee80211_set_link_state(), including
> > drivers. And in particular the WPA handshake will usually be triggered
> > from interrupt context as frames are received, and call into this function.
> > 
> > If ieee80211_set_link_state() now requires a context which can sleep
> > we should really be checking all its callers for similar issues...
> > 
> > Or we stop using a routing message and invent another mechanism that
> > will work within the current requirements of the wireless stack?
> > 
> 
> I think timeout_set_proc() might be what you are looking for.

For this particular case, yes.
But that won't solve ieee80211_set_link_state() being called from
interrupt context, would it?



Re: panic: ieee80211_set_link_state() calls rtm_80211info() from timeout context

2021-12-04 Thread Stefan Sperling
On Sat, Dec 04, 2021 at 10:37:53AM -0600, Scott Cheloha wrote:
> Hit a witness panic during boot yesterday.  Can't repro, have never
> seen it before.  The photo is a mess (ask if you want it) but the
> backtrace is:
> 
> panic
> witness_checkorder
> rw_enter_write
> solock
> route input
> ieee80211_set_link_state
> ieee80211_recv_4way_msg3
> ieee80211_eapol_key_input
> ieee80211_decap
> ieee80211_input_ba_flush
> ieee80211_input_ba_gap_timeout
> timeout_run
> softclock_process_tick_timeout
> sotclock
> 
> We're not allowed to take sleeping locks during the execution of a
> timeout, hence the witness panic.

I was under the impression that timeouts and tasks are equivalent in this
respect. Do timeouts not use a process context which can use rw locks?
Was this never the case? Or did this change recently?

We could schedule a task from the BA gap timeout handler, there is
no problem with doing that.

However, there are many callers of ieee80211_set_link_state(), including
drivers. And in particular the WPA handshake will usually be triggered
from interrupt context as frames are received, and call into this function.

If ieee80211_set_link_state() now requires a context which can sleep
we should really be checking all its callers for similar issues...

Or we stop using a routing message and invent another mechanism that
will work within the current requirements of the wireless stack?



Fix for "panic: ieee80211_encrypt: key unset for sw crypto"

2021-12-03 Thread Stefan Sperling
On Wed, Nov 17, 2021 at 07:53:44PM +0100, Stefan Sperling wrote:
> On Wed, Nov 17, 2021 at 03:25:36PM +, Mikolaj Kucharski wrote:
> > On Mon, Nov 15, 2021 at 12:33:09PM +, Mikolaj Kucharski wrote:
> > > I have more panics, with couple of minutes gap between
> > > ieee80211_setkeysdone() from ieee80211_proto.c and panic
> > > in ieee80211_encrypt() from ieee80211_crypto.c.
> > > 
> > > Sorry for wrapped lines, but I added Time::HiRes in a rush,
> > > the console grabbing script.
> > > 
> > 
> > Improved console grabbing script, and also added address where the key
> > is stored via printf()'s %p. This is full boot up with more debugging.
> > I've added printf()'s all but one for memset()'s in net80211/. Not sure
> > is full diff interesting here.
> > 
> > What is strange, I would imagine, that address of they key is one
> > between those two:
> 
> I still cannot make conclusive sense of this.

More off-list investigation was successful.
We can explain the problem as follows:

The frame which triggers this panic seems to be a unicast frame
which was moved from ni_savedq to ic_pwrsaveq during DTIM (see
function ieee80211_notify_dtim() in ieee80211_node.c).

The if_start() routine of athn(4) pulls frames off ic_pwrsaveq and sends
them out. When a node disappears we clear its ni_savedq, but any frames
for this node queued on ic_pwrsaveq are not removed! The next time
athn's if_start runs a frame buffered on ic_pwrsaveq will be sent,
with the node already in COLLECT state and its WPA key having been
cleared. If the "swcrypto key check" doesn't trigger we may end up
with a use-after-free crash instead (the node pointer is stored in
the mbuf's packet header and becomes invalid once the node is freed).

To fix this we can teach the net80211 stack to remove frames from
ic_pwrsaveq when a client node has decided to leave the AP.
Patch below, which also fixes missing node refcount adjustments when
frames get purged.  ok?

Mikolaj has been running this for a while, and has confirmed that the
DTIM queue removal case in ieee80211_node_leave_pwrsave() does trigger.

diff 114b285775411c80e68d9031cf2b80c83c75d07f /usr/src
blob - 116d71f94d678686994b0de328dcf0735bac024d
file + sys/net80211/ieee80211_node.c
--- sys/net80211/ieee80211_node.c
+++ sys/net80211/ieee80211_node.c
@@ -87,6 +87,8 @@ void ieee80211_node_join_11g(struct ieee80211com *, st
 void ieee80211_node_leave_ht(struct ieee80211com *, struct ieee80211_node *);
 void ieee80211_node_leave_rsn(struct ieee80211com *, struct ieee80211_node *);
 void ieee80211_node_leave_11g(struct ieee80211com *, struct ieee80211_node *);
+void ieee80211_node_leave_pwrsave(struct ieee80211com *,
+struct ieee80211_node *);
 void ieee80211_inact_timeout(void *);
 void ieee80211_node_cache_timeout(void *);
 #endif
@@ -2790,6 +2792,39 @@ ieee80211_node_leave_11g(struct ieee80211com *ic, stru
}
 }
 
+void
+ieee80211_node_leave_pwrsave(struct ieee80211com *ic,
+struct ieee80211_node *ni)
+{
+   struct mbuf_queue keep = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
+   struct mbuf *m;
+
+   if (ni->ni_pwrsave == IEEE80211_PS_DOZE)
+   ni->ni_pwrsave = IEEE80211_PS_AWAKE;
+
+   if (mq_len(>ni_savedq) > 0) {
+   if (ic->ic_set_tim != NULL)
+   (*ic->ic_set_tim)(ic, ni->ni_associd, 0);
+   }
+   while ((m = mq_dequeue(>ni_savedq)) != NULL) {
+   if (ni->ni_refcnt > 0)
+   ieee80211_node_decref(ni);
+   m_freem(m);
+   }
+
+   /* Purge frames queued for transmission during DTIM. */
+   while ((m = mq_dequeue(>ic_pwrsaveq)) != NULL) {
+   if (m->m_pkthdr.ph_cookie == ni) {
+   if (ni->ni_refcnt > 0)
+   ieee80211_node_decref(ni);
+   m_freem(m);
+   } else
+   mq_enqueue(, m);
+   }
+   while ((m = mq_dequeue()) != NULL)
+   mq_enqueue(>ic_pwrsaveq, m);
+}
+
 /*
  * Handle bookkeeping for station deauthentication/disassociation
  * when operating as an ap.
@@ -2811,14 +2846,8 @@ ieee80211_node_leave(struct ieee80211com *ic, struct i
return;
}
 
-   if (ni->ni_pwrsave == IEEE80211_PS_DOZE)
-   ni->ni_pwrsave = IEEE80211_PS_AWAKE;
+   ieee80211_node_leave_pwrsave(ic, ni);
 
-   if (mq_purge(>ni_savedq) > 0) {
-   if (ic->ic_set_tim != NULL)
-   (*ic->ic_set_tim)(ic, ni->ni_associd, 0);
-   }
-
if (ic->ic_flags & IEEE80211_F_RSNON)
ieee80211_node_leave_rsn(ic, ni);
 



Re: ThinkPad E14 Wi-Fi Intel 6 AX201 not working.

2021-11-27 Thread Stefan Sperling
On Fri, Nov 26, 2021 at 07:14:36PM -0300, Salvador Sabaini wrote:
> Tried booting a newer Linux ISO, its loading QuZ-a0-jf-b0-63.ucode
> version 63.c04f3485.0
> 
> It still detects it as being an Intel(R) Wireless-AC 9560

I have added QuZ-a0-jf-b0 iwx-firmware to fw_update in -current.
But I don't know when I will find time to add driver support (and the
lack of such a device on my side makes this harder).

The QuZ-*-jf devices contain iwx-style MAC chips and iwm-style RF chips.
The linux driver calls them 9560 because of the RF chip.
We will probably just keep refering to them as AX201 to keep things less
confusing. Devices with 9560 MAC+RF are already supported by iwm(4).



Re: ThinkPad E14 Wi-Fi Intel 6 AX201 not working.

2021-11-23 Thread Stefan Sperling
On Tue, Nov 23, 2021 at 04:05:20PM -0300, s...@rtfm.com.ar wrote:
> >Synopsis:Intel Wi-Fi 6 AX201 driver not getting attached by the kernel.
> >Category:kernel
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0-current (GENERIC.MP) #111: Sun Nov 21 
> 17:35:37 MST 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   The iwx driver is not getting attached to the kernel. Tried with 7.0 
> release and with
>   7.0-current as of 2021-11-21; tried downloading the iwx firmware with 
> 'fw_update iwx' 
>   (iwx-firmware-20210512 was downloaded) to no avail. It might be missing 
> the device ID.
>   

Please boot Linux and check its logs to see which firmware file is
being loaded for this device. Once we know this, perhaps it can be
made to work on OpenBSD.



Re: iwx now stops more often?

2021-11-22 Thread Stefan Sperling
On Mon, Nov 22, 2021 at 10:58:08AM +, Laurence Tratt wrote:
> On Sun, Nov 21, 2021 at 06:06:33PM +0100, Stefan Sperling wrote:
> 
> Hello Stefan,
> 
> >> As of a kernel from a couple of days ago, iwx semi-regularly stops
> >> associating with my wireless AP. An easy way to trigger this is "pkg_add
> >> -u": at some point, downloading stops mid-package, and I need to "sh
> >> /etc/netstart" to bring the interface back up.
> > Please try this patch. I cannot promise that it will help, but it might.
> 
> I don't understand why, but I have since been unable to replicate the
> problem, which I feel I need to do in order to meaningfully test your patch.
> 
> I'm clutching at straws to think of possible reasons. The best I can think of
> is that `pkg_add -u`'s tendency to cut downloads off mid-stream might have
> caused the odd behaviour. That could be a very long way off the mark.
> 
> If/when I can work out how to more reliably trigger the problem, I will post
> back, but until then I can only apologise for the noise.

This has nothing to do with pkg_add in particular.

My recommendation would be to keep running with 'ifconfig iwx0 debug' enabled.
The logs generated by debug mode might shed some light on the context of this
problem when it occurs again.

The IWX_DEBUG driver-internal flag does not really help here. It is more useful
when zooming in on a driver bug, once the circumstances of the problem have
been understood.



Re: iwx now stops more often?

2021-11-21 Thread Stefan Sperling
On Sun, Nov 21, 2021 at 09:05:29AM +, Laurence Tratt wrote:
> As of a kernel from a couple of days ago, iwx semi-regularly stops
> associating with my wireless AP. An easy way to trigger this is "pkg_add
> -u": at some point, downloading stops mid-package, and I need to "sh
> /etc/netstart" to bring the interface back up.
> 
> My previous kernel was about a week old. I had noticed with that kernel that
> sometimes iwx stopped soon after boot, but one kick of /etc/netstart seemed
> to make it good for the whole day, whereas now it seems to stop multiple
> times (but I hadn't done pkg_add very often in that week!). The AP is a
> Ruckus R510 and none of the other clients connected to it seems to have this
> issue.
> 
> I'm attaching my dmesg + IWX_DEBUG set to 1 in case it helps anyone.

Please try this patch. I cannot promise that it will help, but it might.

diff f6006ae72dd91e94a3c4244318ea54107ae8eedc /usr/src
blob - 38768d23f5005d3cc3d2fc6295ef3a3085484a7e
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -5911,9 +5911,11 @@ iwx_umac_scan_fill_channels(struct iwx_softc *sc,
 * Firmware may become unresponsive when asked to send
 * a directed probe request on a passive channel.
 */
+#if 0
if (n_ssids != 0 && !bgscan &&
(c->ic_flags & IEEE80211_CHAN_PASSIVE) == 0)
chan->flags = htole32(1 << 0); /* select SSID 0 */
+#endif
chan++;
nchan++;
}
@@ -6160,7 +6162,9 @@ iwx_scan_umac_fill_ch_p_v6(struct iwx_softc *sc,
 int
 iwx_umac_scan_v14(struct iwx_softc *sc, int bgscan)
 {
+#if 0
struct ieee80211com *ic = >sc_ic;
+#endif
struct iwx_host_cmd hcmd = {
.id = iwx_cmd_id(IWX_SCAN_REQ_UMAC, IWX_LONG_GROUP, 0),
.len = { 0, },
@@ -6196,6 +6200,7 @@ iwx_umac_scan_v14(struct iwx_softc *sc, int bgscan)
return err;
}
 
+#if 0
if (ic->ic_des_esslen != 0) {
scan_p->probe_params.direct_scan[0].id = IEEE80211_ELEMID_SSID;
scan_p->probe_params.direct_scan[0].len = ic->ic_des_esslen;
@@ -6204,6 +6209,7 @@ iwx_umac_scan_v14(struct iwx_softc *sc, int bgscan)
bitmap_ssid |= (1 << 0);
n_ssid = 1;
}
+#endif
 
iwx_scan_umac_fill_ch_p_v6(sc, _p->channel_params, bitmap_ssid,
n_ssid, bgscan);



Re: panic: ieee80211_encrypt: key unset for sw crypto: 0 on 6.8-beta

2021-11-17 Thread Stefan Sperling
On Wed, Nov 17, 2021 at 07:53:44PM +0100, Stefan Sperling wrote:
> I don't see where and how this could happen, but this seems to be where
> this bug is hiding. Multicast frames are also never encrypted, so they
> would never even trigger any attempt to use a key.

Sorry, I was not making much sense here because I confused
management with broadcast/multicast frames in my mind.

We do not encrypt management frames, but multicast frames will be
encrypted with a group key. So the use of encryption in this interrupt
handler is legit. The group key should be from one of the two addresses
you've identified, and the bogus key address you've seen is something else. 

I wonder if this bogus address corresponds to >ic_bss->ni_pairwise_key
or >ni_pairwise_key of some associated client?



Re: panic: ieee80211_encrypt: key unset for sw crypto: 0 on 6.8-beta

2021-11-17 Thread Stefan Sperling
On Wed, Nov 17, 2021 at 03:25:36PM +, Mikolaj Kucharski wrote:
> On Mon, Nov 15, 2021 at 12:33:09PM +, Mikolaj Kucharski wrote:
> > I have more panics, with couple of minutes gap between
> > ieee80211_setkeysdone() from ieee80211_proto.c and panic
> > in ieee80211_encrypt() from ieee80211_crypto.c.
> > 
> > Sorry for wrapped lines, but I added Time::HiRes in a rush,
> > the console grabbing script.
> > 
> 
> Improved console grabbing script, and also added address where the key
> is stored via printf()'s %p. This is full boot up with more debugging.
> I've added printf()'s all but one for memset()'s in net80211/. Not sure
> is full diff interesting here.
> 
> What is strange, I would imagine, that address of they key is one
> between those two:

I still cannot make conclusive sense of this.

Your debug log lacks some information:

The node state everywhere a ieee80211_node *ni is involved.
Add this to printf calls to show this: "%d", ni->ni_state

The MAC address everywhere a ieee80211_node *ni is involved.
Add this to printf calls to show the MAC: "%s", ether_sprintf(ni->ni_macaddr)
(Note that you can only use one ether_sprintf() call per printf() call
because ether_sprintf() uses a static buffer.)

Everywhere you see a 'wh', please print whether 
IEEE80211_IS_MULTICAST(wh->i_addr1) is true or false.

> 2027.123448.498681Z: MMM: ieee80211_clear_htcaps() 
> [ieee80211_node.c|2328] memset() v25

This probably implies a node has left the network.
But it is unclear which node, since we don't see its MAC address.
This might be related to the crash, or it might not.

If this is happening in the call path I believe it is, then we are clearing
ni_pairwise_key at this point, which could explain why you see a key with all
zeros in your debug log.

However, this does not really make sense for another reason, because the
crash happens in a code path which should only be processing multicast
frames, not unicast (more below).

> 2027.123448.503936Z: MMM: ar5008_tx() [ar5008.c|1527]
> 2027.123448.506663Z: MMM: ar5008_tx() [ar5008.c|1530]: key unset for sw 
> crypto: 0
> 2027.123448.513071Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|262]: 
> k: 0x8000225a6708
> 2027.123448.518301Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|263]: 
> k_id: 0x0
> 2027.123448.523580Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|264]: 
> k_flags: 0x0
> 2027.123448.530034Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|265]: 
> k_len: 0x0
> 2027.123448.535279Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|266]: 
> k_cipher: 0x0
> 2027.123448.540526Z: MMM: ieee80211_encrypt() [ieee80211_crypto.c|267]: 
> k_key: 0x
> 2027.123448.552314Z: panic: ieee80211_encrypt: key unset for sw crypto: 
> id=0 cipher=0 flags=0x0
> 2027.123448.561466Z: Stopped at  db_enter+0x10:  popq%rbp
> 
> What I did not expect is k=0x8009cfc8 and k=0x8009cf00
> between kid=2 and kid=1 and during panic k=0x8000225a6708. I don't
> know from where 0x8000225a6708 came from.

This probably means the key being used is not the group key. Does the
unexpected address correspond to the memory address of any ni->ni_parwise_key?
And what is the MAC address of this node?
And what is ni->ni_state of this node?

Now taking look at the back trace:

> 2027.123448.728657Z: *cpu0: ieee80211_encrypt: key unset for sw crypto: 
> id=0 cipher=0 flags=0x0
> 2027.123448.735068Z: ddb{0}> 
> 2027.123448.735652Z: 
> 2027.123448.735864Z: trace
> 2027.123448.742778Z: db_enter() at db_enter+0x10
> 2027.123448.747944Z: panic(81e99dc5) at panic+0xbf
> 2027.123448.753069Z: 
> ieee80211_encrypt(8009c048,fd80ca4f4d00,80d2d1c0) at 
> ieee80211_encrypt+0x21d
> 2027.123448.763317Z: 
> ar5008_tx(8009c000,fd80ca4f4d00,80d2d000,0) at 
> ar5008_tx+0x1e7
> 2027.123448.769802Z: athn_start(8009c048) at athn_start+0x108
> 2027.123448.777576Z: ar5008_intr(8009c000) at ar5008_intr+0x216

Here we are in the interrupt handler of athn(4).

The queue of power-saved frames being processed here should only contain
multicast cast frames, not unicast.
If these are multicast frames then a group key should be used, and you
would see one of the expected key two addresses.

If that is not true then we have a bug where a unicast frame is enqueued on
ic->ic_bss->ni_saveqd (the AP) instead of an ni->ni_savedq which represents
an associated station.
I don't see where and how this could happen, but this seems to be where
this bug is hiding. Multicast frames are also never encrypted, so they
would never even trigger any attempt to use a key. Only unicast frames
could trigger encryption, but they should not appear when the interrupt
handler sends frames buffered in ic->ic_bss->ni_savedq because this
queue is supposed to contain multicast frames only!!!

Another possibility is that 

Re: panic: ieee80211_has_seq(wh) assertion failed

2021-11-08 Thread Stefan Sperling
On Mon, Nov 08, 2021 at 01:57:53PM +0100, Paul de Weerd wrote:
> Hi all,
> 
> After upgrading my laptop to a newer snapshot this weekend, I started
> getting panics.   I was running OpenBSD 7.0-current (GENERIC.MP) #60:
> Sun Oct 31 13:27:05 MDT 2021 before the upgrade.  Hand-typed from a
> picture I took:
> 
> panic: kernel diagnostic assertion "ieee80211_has_seq(hw)" failed: file 
> "/usr/src/sys/net80211/ieee80211_input.c", line 145
> Stopped at  db_enter+0x10:  popq%rbp
> TIDPIDUID PRFLAGS PFLAGS   CPU   COMMAND
>  503471  517651070x100010  0x400 3   vmd
>  333711  517651070x100010  0x400 1   vmd
>  148566  72734  0 0x14000  0x200 2   drmwq
> db_enter() at db_enter+0x10
> panic(81e52e8a) at panic+0xbf
> __assert(81ec2db8,81ef7c3b,91,81ecda4c) at 
> __assert+0x25
> ieee80211_get_hdrlen(fd805ce2967a) at ieee80211_get_hdrlen+0x8f
> iwx_ccmp_decap(...) at iwx_ccmp_decap+0x40
> iwx_rx_frame(...) at iwx_rx_frame+0xe1
> iwx_release_frames(...) at iwx_release_frames+0x17f
> iwx_rx_reorder(...) at iwx_rx_reorder+0x595
> iwx_rx_mpdu_mq(...) at iwx_rx_mpdu_mq+0x2e0
> iwx_rx_pkt(...) at iwx_rx_pkt+0x650
> iwx_notif_intr(...) at iwx_notif_intr+0x9c
> iwx_intr_msix(...) at iwx_intr_msix+0xb4
> intr_handler(...) at intr_handler+0x6e
> Xintr_ioapic_edge31_untramp() at Xintr_ioapic_edge31_untramp+0x18f
> end trace frame: 0x80001d98f4e0, count: 0
> 

Unfortunately, I cannot make any sense of this.

There were no new changes in iwx(4) since Oct 15. If the Oct 31 snap
was working fine then your problem is not related to driver-side changes.

Also in the regular code flow there is no way that non-QoS data frames
end up on the list processed in iwx_release_frames(). And QoS data frames
always have a sequence number so this assertion simply should not trigger.

I guess there could be some kind of memory corruption, where either an
mbuf or the mbuf list maintained by the driver is garbage?

Please try to bisect kernels from source and see whether you can find
a commit which started this. I don't have a better answer for now.



Re: Kernel panics in OpenBSD 7.0 with veb rule filters

2021-11-07 Thread Stefan Sperling
On Mon, Nov 08, 2021 at 06:37:17AM +, Renato Aguiar wrote:
> OpenBSD 7.0 kernel panics when veb is used with a rule filter to tag
> packets coming from a vlan interface, for example:
> 
>   ifconfig veb0 rule pass in on vlan10 tag FOO

> panic: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth > 0" 
> f

This has just been fixed in -current:
https://marc.info/?l=openbsd-cvs=163634496007053=2



Re: raspberry pi 4 model b: xhci0: host system error

2021-11-06 Thread Stefan Sperling
On Sat, Nov 06, 2021 at 07:53:57PM +0100, Paul de Weerd wrote:
> It's not so much throughput as it is latency.  From the rpi to its
> gateway I see solid RTTs (note that I'm SSH'd into the raspberry to
> run this command):
> 
> 64 bytes from 192.168.34.1: icmp_seq=0 ttl=255 time=1.870 ms
> 64 bytes from 192.168.34.1: icmp_seq=1 ttl=255 time=2.738 ms
> 64 bytes from 192.168.34.1: icmp_seq=2 ttl=255 time=2.995 ms
> 64 bytes from 192.168.34.1: icmp_seq=3 ttl=255 time=2.617 ms
> 64 bytes from 192.168.34.1: icmp_seq=4 ttl=255 time=2.735 ms
> 64 bytes from 192.168.34.1: icmp_seq=5 ttl=255 time=2.774 ms
> 64 bytes from 192.168.34.1: icmp_seq=6 ttl=255 time=2.717 ms
> 64 bytes from 192.168.34.1: icmp_seq=7 ttl=255 time=2.655 ms
> 64 bytes from 192.168.34.1: icmp_seq=8 ttl=255 time=2.726 ms
> 64 bytes from 192.168.34.1: icmp_seq=9 ttl=255 time=2.788 ms
> 
> --- 192.168.34.1 ping statistics ---
> 10 packets transmitted, 10 packets received, 0.0% packet loss
> round-trip min/avg/max/std-dev = 1.870/2.661/2.995/0.281 ms
> 
> The other direction is a lot more erratic:
> 
> 64 bytes from 192.168.34.129: icmp_seq=0 ttl=255 time=52.829 ms
> 64 bytes from 192.168.34.129: icmp_seq=1 ttl=255 time=67.423 ms
> 64 bytes from 192.168.34.129: icmp_seq=2 ttl=255 time=296.138 ms
> 64 bytes from 192.168.34.129: icmp_seq=3 ttl=255 time=115.754 ms
> 64 bytes from 192.168.34.129: icmp_seq=4 ttl=255 time=37.241 ms
> 64 bytes from 192.168.34.129: icmp_seq=5 ttl=255 time=61.002 ms
> 64 bytes from 192.168.34.129: icmp_seq=6 ttl=255 time=290.358 ms
> 64 bytes from 192.168.34.129: icmp_seq=7 ttl=255 time=108.851 ms
> 64 bytes from 192.168.34.129: icmp_seq=8 ttl=255 time=30.519 ms
> 64 bytes from 192.168.34.129: icmp_seq=9 ttl=255 time=55.587 ms
> 
> --- 192.168.34.129 ping statistics ---
> 10 packets transmitted, 10 packets received, 0.0% packet loss
> round-trip min/avg/max/std-dev = 30.519/111.570/296.138/94.467 ms
> 
> When the raspberry is doing traffic (i.e. pinging its gateway or
> downloading a file), replies come a lot faster (similar to RTT's when
> pinging from the raspberry)

This could be wifi-client power management, in which case there is
nothing to worry about. You are not losing any packets. I would assume
frames get buffered in the AP while the wifi chip on the rpi is taking
a short power-saving nap. The beacon period of an AP is usually about
100ms and it might take a beacon or two for the client to wake up and
see the message from the AP that tells it about frames which have been
buffered for it. These RTTs of <= 300 ms fit into that picture.



Re: iwm sporadically disconnects since OpenBSD 7.0

2021-10-18 Thread Stefan Sperling
On Mon, Oct 18, 2021 at 07:14:57PM +0200, Richard Ulmer wrote:
> > Something else you could try is downgrading the firmware image.
> > This can be done without patching the kernel by swapping out the
> > firmware file:
> > 
> >   mv /etc/firmware/iwm-7265D-29 /etc/firmware/iwm-7265D-29.orig
> >   cp /etc/firmware/iwm-7265-17 /etc/firmware/iwm-7265D-29
> > 
> > The 7265-17 image was used in OpenBSD 6.9.
> > The 7265D-29 image is used in OpenBSD 7.0.
> > 
> > -current now defaults to the 7265-17 again, because we found out (a bit
> > too late for the release) that the 7265D-29 firmware was causing issues.
> 
> I did this and my WI-FI seems to be stable again. Thanks a lot!

OK, good to know.

7265D works with the linux iwlwifi driver and it worked fine when I tested
it with 7265 iwm(4) in an x250 thinkpad with a pepwave access point. Later,
some people reported various failures that all looked slightly different.
It took a while until Landry bisected the problem down to the change in
firmware version. The original problem reports we received looked like
badly connected antennas, hardware issues, or 802.11 protocol interop
problems, and all this led us down several wrong paths in our investigation.
Which is why it took some time until this workaround was found.

> > Please try a -current kernel. There have been several improvements since
> > the 7.0 release tag. Perhaps your problem has already been fixed?
> > 
> > If the problem still occurs, please enable 'ifconfig iwm0 debug' and
> > wait for it to trigger again. The driver should then print additional
> > messages to /var/log/messages providing more context about the error.
> 
> I have never set up a -current OpenBSD, so I'm not comfortable with
> this. Sorry. I could however try to catch the error with 'ifconfig iwm0
> debug' enabled. Would that help you, even if I'm not on -current?

Not really. There is nothing the driver does differently so any failure
messages we can see in dmesg will just be a symptom of the underlying issue.
Somehow 7265D firmware is unhappy with our driver in some cases.
It's not high on my list of things to fix, given that development time is
always limited, a workaround is known, and there are more interesting
problems to solve. But it would be good to fix this eventually, since the
7265D firmware looks better on paper since it has received more maintenance
from Intel than the older ones, which might include security fixes for
all we can tell (intel doesn't publish any wifi firmware changelogs).



Re: iwm sporadically disconnects since OpenBSD 7.0

2021-10-17 Thread Stefan Sperling
On Mon, Oct 18, 2021 at 12:07:43AM +0200, Stefan Sperling wrote:
> On Sun, Oct 17, 2021 at 08:46:47PM +0200, Richard Ulmer wrote:
> > >Synopsis:  iwm sporadically disconnects WI-FI connection
> > >Category:  system
> > >Environment:
> > System  : OpenBSD 7.0
> > Details : OpenBSD 7.0 (GENERIC.MP) #232: Thu Sep 30 14:25:29 MDT 
> > 2021
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Since the upgrade from OpenBSD 6.9 to 7.0 a few hours ago, my WI-FI
> > connection dies every few minutes. It feels like it dies when there is a
> > lot of network traffic (`pkg_add -u` or when watching video streams).
> > The connection automatically resumes after ~1min. My dmesg shows this:
> > iwm0: fatal firmware error
> > iwm0: could not remove MAC context (error 35)
> > iwm0: device timeout
> > iwm0: device timeout
> > iwm0: device timeout
> > iwm0: device timeout
> 
> Please try a -current kernel. There have been several improvements since
> the 7.0 release tag. Perhaps your problem has already been fixed?
> 
> If the problem still occurs, please enable 'ifconfig iwm0 debug' and
> wait for it to trigger again. The driver should then print additional
> messages to /var/log/messages providing more context about the error.

Something else you could try is downgrading the firmware image.
This can be done without patching the kernel by swapping out the
firmware file:

  mv /etc/firmware/iwm-7265D-29 /etc/firmware/iwm-7265D-29.orig
  cp /etc/firmware/iwm-7265-17 /etc/firmware/iwm-7265D-29

The 7265-17 image was used in OpenBSD 6.9.
The 7265D-29 image is used in OpenBSD 7.0.

-current now defaults to the 7265-17 again, because we found out (a bit
too late for the release) that the 7265D-29 firmware was causing issues.



Re: iwm sporadically disconnects since OpenBSD 7.0

2021-10-17 Thread Stefan Sperling
On Sun, Oct 17, 2021 at 08:46:47PM +0200, Richard Ulmer wrote:
> >Synopsis:iwm sporadically disconnects WI-FI connection
> >Category:system
> >Environment:
>   System  : OpenBSD 7.0
>   Details : OpenBSD 7.0 (GENERIC.MP) #232: Thu Sep 30 14:25:29 MDT 
> 2021
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Since the upgrade from OpenBSD 6.9 to 7.0 a few hours ago, my WI-FI
>   connection dies every few minutes. It feels like it dies when there is a
>   lot of network traffic (`pkg_add -u` or when watching video streams).
>   The connection automatically resumes after ~1min. My dmesg shows this:
>   iwm0: fatal firmware error
>   iwm0: could not remove MAC context (error 35)
>   iwm0: device timeout
>   iwm0: device timeout
>   iwm0: device timeout
>   iwm0: device timeout

Please try a -current kernel. There have been several improvements since
the 7.0 release tag. Perhaps your problem has already been fixed?

If the problem still occurs, please enable 'ifconfig iwm0 debug' and
wait for it to trigger again. The driver should then print additional
messages to /var/log/messages providing more context about the error.

Thanks!



Re: revert to pre-fragattack iwm firmware to fix hw rev 0x210 / AC 7265 on X1 gen3

2021-10-03 Thread Stefan Sperling
On Sun, Oct 03, 2021 at 09:35:37AM +0200, Landry Breuil wrote:
> Hi,
> 
> i've been having iwm(4) instability on a X1 Gen3 issues since july
> (symptom/reproducer: unable to fetch base70.tgz, stalls after 100Mb and link
> goes down, ifconfig up/down/reassoc 'resolves' it - pkg_add -u also fails 
> after
> a while, erratic ping, etc..) - tried forcing 2Ghz/5Ghz modes but that doesnt
> help on this laptop (forcing 5Ghz 'solves' a similar issue on a T470s but
> i dont remember the hw rev/model right now).
> 
> iwm0 at pci2 dev 0 function 0 "Intel AC 7265" rev 0x59, msi
> iwm0: hw rev 0x210, fw ver 17.3216344376.0

Reverting to old firmware for this device is fine with me.
I don't have spare time to try to figure out why this isn't working as it
should. Seeing that this firmware does work with the linux driver there
clearly is something wrong in our driver. But I can't fix everything.



Re: panic: ieee80211_encrypt: key unset for sw crypto: 0 on 6.8-beta

2021-09-25 Thread Stefan Sperling
On Sat, Sep 25, 2021 at 02:03:40AM +, Mikolaj Kucharski wrote:
> I've added more info, probably mainly for myself. I'm not sure where to
> go with this information yet.

We need to figure out what makes this code use a group key which
has been cleared.

Please add printfs for lines of code which modify ic->ic_def_txkey.
There's one in ieee80211_setkeysdone() which might be particularly
relevant. It is called from the group key renewal timeout which
triggers once per hour. Can you reproduce the issue more quickly if
you change the 3600s timeout in ieee80211_gtk_rekey_timeout() to
a smaller amount of time, say every 60s?



Re: loosing wifi connecton

2021-09-24 Thread Stefan Sperling
On Fri, Sep 24, 2021 at 06:34:42PM +0200, BESSOT Jean-Michel wrote:
> Hello
> 
> I didi test your patch. I run a bkg_add -ui without timeout error now. I
> continue to test to see if it wasn't luck but your patch probably fixed the
> bug.
> 
> Thank you
> 
> bye

Thanks, I have committed the patch. In any case, this patch cannot hurt.



Re: loosing wifi connecton

2021-09-24 Thread Stefan Sperling
On Sun, Sep 19, 2021 at 03:10:01PM +0200, BESSOT Jean-Michel wrote:
> Hello
> 
> Here is the problem, I loose the wifi connection after some time on my
> computer . Mostly when I use a cvs -q up -Pd or a  pkg_add -ui.. I get the
> error message:
> 
> iwm0: device timeout
> 
> It gets back after some time.
> 
> bye

Hi Jean-Michel,

Please try this patch:

diff bc5b770876412094a0961d7fcb9f635427388600 /usr/src
blob - 58d7c6dcc1ffaef2c89f818eb46d08eebfdcbc89
file + sys/dev/pci/if_iwm.c
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -5720,6 +5720,8 @@ iwm_rx_compressed_ba(struct iwm_softc *sc, struct iwm_
if (qid != IWM_FIRST_AGG_TX_QUEUE + ban->tid)
return;
 
+   sc->sc_tx_timer = 0;
+
ba = >ni_tx_ba[ban->tid];
if (ba->ba_state != IEEE80211_BA_AGREED)
return;



Re: Firefox 88.0.1 needs codecs, impossible to watch videos/movies.

2021-09-12 Thread Stefan Sperling
On Sun, Sep 12, 2021 at 03:03:30PM +, macondo wrote:
> http://www.fmovies.toerror code 102630
> http://www.aljazeera.comerror code: media_err_src_not_supported
> http://www.bbc.comsame
> >From Firefox:To play video you may need to install the required video codecs.

This should fix it: pkg_add ffmpeg



Re: panic: config_detach: forced detach of iwm0 failed (45)

2021-08-08 Thread Stefan Sperling
On Sat, Aug 07, 2021 at 08:59:33PM -0700, Bryan Linton wrote:
> On 2021-08-07 16:05:34, Bryan Linton  wrote:
> > >Synopsis:  The system randomly panics with the message: panic: 
> > >config_detach: forced detach of iwm0 failed (45)
> > >Category:  system
> > >Environment:
> > System  : OpenBSD 6.9
> > Details : OpenBSD 6.9-current (GENERIC.MP) #69: Fri Jun 11 08:31:52 
> > MDT 2021
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > I upgraded from a 2021-06-11 snapshot to a 2021-07-22
> > snapshot and occasionally had the following crash.  I
> > upgraded to several later snapshots hoping that a fix
> > might have already been committed, but still found the
> > same crashes up until an August 2nd (or so) snapshot.
> > 
> > panic: config_detach: forced detach of iwm0 failed (45)
> > 
> > [...]
> > 
> > >Fix:
> > Unknown.  Reverting to the June 11th 2021 snapshot fixed
> > the issue for me.
> > 
> 
> Apperently I have to retract this statement.  I had believed the
> problem to have started somewhere after a 2021-06-11 snapshot,
> since I had not experienced the crash on that until I upgraded to
> the 2021-07-22 snapshot,
> 
> However, a scant few hours after sending this bug report, the
> 2021-06-11 snapshot I reverted to crashed with the same error
> after a week of running with no problems.
> 
> Again, the laptop was incredibly warm when it happened, so my gut
> instinct is that it may be heat-related.  Unfortunately, the room
> I work in is without air conditioning, so ambient temperatures in
> the mid to high 30s degC are unavoidable at the moment.
> 
> I suppose it's good news in that the problem was likely not
> introduced in the recent iwm fixes.  Nevertheless, my willingness
> to test patches and/or provide more information still stands.

This could indeed be related to inadequate cooling. It looks like the
device is falling off the PCI bus suddenly, so the wifi card is probably
losing power. Unfortunately I have no idea how that could be prevented,
apart from bringing the temperature down.
Given that the wifi card seems to be losing power I wouldn't rule out the
possibility that your hardware is starting to fail, with worse consequences
down the line.

The driver doesn't support detaching because these wifi cards are not
intended to be hot-plugged.

Firmware supports a "thermal threshold exceeded" notification, but it
sounds like it isn't triggering in your case. When this happens a message
such as this should appear in dmesg and the interface should go down:
  iwm0: device at critical temperature (X degC), stopping device
But I have never seen this trigger, nor has anyone else as far as I know.



Re:

2021-07-08 Thread Stefan Sperling
On Thu, Jul 08, 2021 at 04:00:15PM +0200, land...@asia.com wrote:
> >Synopsis:
> >Category:
> >Environment:
>   System  : OpenBSD 6.9
>   Details : OpenBSD 6.9 (GENERIC) #4: Mon Jun  7 08:20:14 MDT 2021
>
> r...@syspatch-69-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   <"Realtek 8192SE" doesn't work. the nice people in #openbsd said i 
> could use sendbug since it's probably just a small change.>
> >How-To-Repeat:
>kill me!>
> >Fix:
>   

Sorry, the *SE variants aren't yet supported.
Adding support for these would certainly not be a "small change".
It would involve figuring out what realtek's driver developers did
in Linux rtlwifi's subdirectory for this chip relative to the already
supported variants, and then writing corresponding code for OpenBSD's
rtwn driver from scratch (rtlwifi is GPL).



Re: Netgear AC1200 WiFi USB adapter comes up as ugen "Realtek A6150" rev 2.10/2.10 addr 9

2021-05-31 Thread Stefan Sperling
On Mon, May 31, 2021 at 07:04:57PM +0200, Peter N. M. Hansteen wrote:
> yes, the only reason I do not use that at the moment is that for some reason
> that interface is unable to load the firmware correctly (possibly needing a
> variant we have not encountered earlier). What I get is
> 
> wx0: firmware parse error 22, section type 0
> iwx0: failed to load init firmware

Type 0 is not a valid section type.
Are you sure you had the right firmware image installed when this happened?

SHA1 (/etc/firmware/iwx-QuZ-a0-hr-b0-48) = 
3e1b284bcfc28e4040dd50a5dc776411cd5f9580

> iwx0: firmware parse error 22, section type 48
> iwx0: failed to load init firmware

This time around it breaks on section type 48 which is valid.
I guess the previous invalid firmware image was replaced?

This hits a bug which breaks re-loading firmware after a previous attempt
to load a firmware image failed. The patch below should fix this bug.
Note that rebooting will also fix this so you'd have to trigger the
above 'section type 0' loading failure again to reproduce this bug.

diff 385a08f3e862586df8f1803dfa09fc765a5c3610 /usr/src
blob - a37ba57c9055896b168e694bbcf4408c2b6372a6
file + sys/dev/pci/if_iwx.c
--- sys/dev/pci/if_iwx.c
+++ sys/dev/pci/if_iwx.c
@@ -987,6 +987,8 @@ iwx_read_firmware(struct iwx_softc *sc)
sc->sc_capaflags = 0;
sc->sc_capa_n_scan_channels = IWX_DEFAULT_SCAN_CHANNELS;
memset(sc->sc_enabled_capa, 0, sizeof(sc->sc_enabled_capa));
+   memset(sc->sc_ucode_api, 0, sizeof(sc->sc_ucode_api));
+   sc->n_cmd_versions = 0;
 
uhdr = (void *)fw->fw_rawdata;
if (*(uint32_t *)fw->fw_rawdata != 0



Re: Netgear AC1200 WiFi USB adapter comes up as ugen "Realtek A6150" rev 2.10/2.10 addr 9

2021-05-31 Thread Stefan Sperling
On Mon, May 31, 2021 at 06:10:22PM +0200, Peter N. M. Hansteen wrote:
> I was hoping one of the existing realtek wifi drivers might do the trick
> but apparently not, as of right now.
> 
> Sendbug, pcidump and usbdevs output attached.

None of the 11ac realtek adapters are supported yet.
They need a new driver to be written.
Linux and FreeBSD may have working drivers for this chip.

On the particular machine which produced this dmesg you'd be better off
using the built-in iwx(4) interface anyway, but I guess you're aware of
that :)



Re: iwm: Fatal firmware error (could not add sta (error 35))

2021-05-14 Thread Stefan Sperling
On Fri, May 14, 2021 at 02:39:15PM +0200, Matthias Schmidt wrote:
> I am now running 
> 
> OpenBSD 6.9-current (GENERIC.MP) #17: Wed May 12 11:14:50 MDT 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> which contains your described fix.  Since then I occasionally see a new
> firmare error I haven't seen before.  Most of the time the interface
> recovers but sometimes I have to bring it down and up again.

This looks like your AP is disappearing for some reason.
The AP may be switching channels or you may have moved out of range.
And the driver doesn't handle the resulting state transitions correctly.

Thanks for reporting this! I will look into it.

> May 14 13:59:59 sigma /bsd: iwm0: received msg 1/2 of the group key handshake 
> from cc:ce:1e:8b:cf:d1
> May 14 13:59:59 sigma /bsd: iwm0: sending msg 2/2 of the group key handshake 
> to cc:ce:1e:8b:cf:d1
> May 14 14:01:20 sigma /bsd: iwm0: RUN -> ASSOC
> May 14 14:01:20 sigma /bsd: iwm0: sending action to cc:ce:1e:8b:cf:d1 on 
> channel 100 mode 11n
> May 14 14:01:20 sigma /bsd: iwm0: sending assoc_req to cc:ce:1e:8b:cf:d1 on 
> channel 100 mode 11n
> May 14 14:01:24 sigma /bsd: iwm0: association timed out for cc:ce:1e:8b:cf:d1
> May 14 14:01:24 sigma /bsd: iwm0: dumping device error log
> May 14 14:01:24 sigma /bsd: iwm0: Start Error Log Dump:
> May 14 14:01:24 sigma /bsd: iwm0: Status: 0x9, count: 6
> May 14 14:01:24 sigma /bsd: iwm0: 0x3421 | ADVANCED_SYSASSERT  
> May 14 14:01:24 sigma /bsd: iwm0: 0220 | trm_hw_status0
> May 14 14:01:24 sigma /bsd: iwm0:  | trm_hw_status1
> May 14 14:01:24 sigma /bsd: iwm0: 00023FDC | branchlink2
> May 14 14:01:24 sigma /bsd: iwm0: 0003915A | interruptlink1
> May 14 14:01:24 sigma /bsd: iwm0:  | interruptlink2
> May 14 14:01:24 sigma /bsd: iwm0:  | data1
> May 14 14:01:24 sigma /bsd: iwm0: 0001 | data2
> May 14 14:01:24 sigma /bsd: iwm0: DEADBEEF | data3
> May 14 14:01:24 sigma /bsd: iwm0:  | beacon time
> May 14 14:01:24 sigma /bsd: iwm0: E8F0FA81 | tsf low
> May 14 14:01:24 sigma /bsd: iwm0: 0024 | tsf hi
> May 14 14:01:24 sigma /bsd: iwm0:  | time gp1
> May 14 14:01:24 sigma /bsd: iwm0: 20010BB2 | time gp2
> May 14 14:01:24 sigma /bsd: iwm0: 0001 | uCode revision type
> May 14 14:01:24 sigma /bsd: iwm0: 0022 | uCode version major
> May 14 14:01:24 sigma /bsd: iwm0:  | uCode version minor
> May 14 14:01:24 sigma /bsd: iwm0: 0230 | hw version
> May 14 14:01:24 sigma /bsd: iwm0: 18089000 | board version
> May 14 14:01:24 sigma /bsd: iwm0: 007C0028 | hcmd
> May 14 14:01:24 sigma /bsd: iwm0: 24022082 | isr0
> May 14 14:01:24 sigma /bsd: iwm0: 0100 | isr1
> May 14 14:01:24 sigma /bsd: iwm0: 08201802 | isr2
> May 14 14:01:24 sigma /bsd: iwm0: 004140C0 | isr3
> May 14 14:01:24 sigma /bsd: iwm0:  | isr4
> May 14 14:01:24 sigma /bsd: iwm0: 007B002B | last cmd Id
> May 14 14:01:24 sigma /bsd: iwm0:  | wait_event
> May 14 14:01:24 sigma /bsd: iwm0: 0080 | l2p_control
> May 14 14:01:24 sigma /bsd: iwm0: 00018010 | l2p_duration
> May 14 14:01:24 sigma /bsd: iwm0: 003F | l2p_mhvalid
> May 14 14:01:24 sigma /bsd: iwm0:  | l2p_addr_match
> May 14 14:01:24 sigma /bsd: iwm0: 000D | lmpm_pmg_sel
> May 14 14:01:24 sigma /bsd: iwm0: 30101345 | timestamp
> May 14 14:01:24 sigma /bsd: iwm0: A8B8 | flow_handler
> May 14 14:01:24 sigma /bsd: iwm0: Start UMAC Error Log Dump:
> May 14 14:01:24 sigma /bsd: iwm0: Status: 0x9, count: 7
> May 14 14:01:24 sigma /bsd: iwm0: 0x0070 | NMI_INTERRUPT_LMAC_FATAL
> May 14 14:01:24 sigma /bsd: iwm0: 0x | umac branchlink1
> May 14 14:01:24 sigma /bsd: iwm0: 0xC0086964 | umac branchlink2
> May 14 14:01:24 sigma /bsd: iwm0: 0xC0083A94 | umac interruptlink1
> May 14 14:01:24 sigma /bsd: iwm0: 0xC0083A94 | umac interruptlink2
> May 14 14:01:24 sigma /bsd: iwm0: 0x0800 | umac data1
> May 14 14:01:24 sigma /bsd: iwm0: 0xC0083A94 | umac data2
> May 14 14:01:24 sigma /bsd: iwm0: 0xDEADBEEF | umac data3
> May 14 14:01:24 sigma /bsd: iwm0: 0x0022 | umac major
> May 14 14:01:24 sigma /bsd: iwm0: 0x | umac minor
> May 14 14:01:24 sigma /bsd: iwm0: 0xC088628C | frame pointer
> May 14 14:01:24 sigma /bsd: iwm0: 0xC088628C | stack pointer
> May 14 14:01:24 sigma /bsd: iwm0: 0x007C0028 | last host cmd
> May 14 14:01:24 sigma /bsd: iwm0: 0x | isr status reg
> May 14 14:01:24 sigma /bsd: driver status:
> May 14 14:01:24 sigma /bsd:   tx ring  0: qid=0  cur=125 queued=1  
> May 14 14:01:24 sigma /bsd:   tx ring  1: qid=1  cur=0   queued=0  
> May 14 14:01:24 sigma /bsd:   tx ring  2: qid=2  cur=0   queued=0  
> May 14 14:01:24 sigma /bsd:   tx ring  3: qid=3  cur=0   queued=0  
> May 14 14:01:24 sigma /bsd:   tx ring  4: qid=4  cur=0   queued=0  
> May 14 14:01:24 sigma /bsd:   tx ring  5: qid=5  cur=119 queued=2  
> May 14 14:01:24 sigma /bsd:   tx ring  6: qid=6  cur=0   queued=0  
> May 14 14:01:24 sigma /bsd:   tx ring  7: 

Re: Strange behavior of 'sh /etc/netstart tun1' which assigns IPv4 to tun0 instead of tun1

2021-05-13 Thread Stefan Sperling
On Thu, May 13, 2021 at 10:43:59AM +, Martin wrote:
> Hi list,
> 
> 'sh /etc/netstart tun1' assign IPv4 address from '/etc/hostname.tun1' to 
> 'tun0' interface which must be IPv6 only instead of 'tun1' as argument of 
> netstart.
> 
> # ifconfig tun0
> tun0: flags=8051< mtu 1500
>  description: IPv6 interface
>  index priority 0 llprio 3
>  groups: tun
>  status: active
>  inet6: fe80::%tun0 --> prefixlen 64 scopeid 0x1f
>  inet6: fd90:... --> prefixlen 48
>  inet 192.168.55.1 --> 192.168.55.255 netmask 0xff00
> 
> I have two tun0 and tun1 interfaces defined by /etc/hostname.tun0 and 
> /etc/hostname.tun1
> 
> 1. tun0 contains
> up
> !/bin/command ...
> which run software and this software assign IPv6 address only to 'tun0' 
> interface once started. There is no IPv4 for 'tun0' just IPv6.
> 
> 2. $ cat /etc/hostname.tun1
> up
> inet 192.168.55.1 255.255.255.0 192.168.55.255
> !/usr/bin/env LD_LIBRARY_PATH=/usr/lib:/usr/local/lib /usr/local/sbin/openvpn 
> --config /etc/openvpn/server.conf
> 
> Why 'sh /etc/netstart tun1' assign 192.168.55.1 to 'tun0' instead of 'tun1' 
> as argument? Can it be a bug or misconfiguration?
> 
> Martin
> 
> 

The bug is probably in your secret /bin/command, for all I can tell.



Re: iwm: Fatal firmware error (could not add sta (error 35))

2021-05-11 Thread Stefan Sperling
On Tue, May 11, 2021 at 11:44:17AM +0200, Stefan Sperling wrote:
> Can you please run with this and let me know if it changes anything?

I have finally managed to reproduce the problem locally by playing around
with forced background scans and roaming. This patch is a superset of the
previous patch. It should fix the 'add sta' problem and also fixes a couple
of small bugs I found along the way.

diff refs/heads/master refs/heads/iwm-txaggfixes
blob - f3380617f2f5a80854e72b5a44cceff28d42aafc
blob + f430116aaabda132d2b9bf9acd7eca260b25373a
--- sys/dev/pci/if_iwm.c
+++ sys/dev/pci/if_iwm.c
@@ -3312,9 +3312,9 @@ iwm_sta_tx_agg(struct iwm_softc *sc, struct ieee80211_
if (start) {
/* Enable Tx aggregation for this queue. */
in->tid_disable_ampdu &= ~(1 << tid);
-   in->tfd_queue_msk |= htole32(1 << qid);
+   in->tfd_queue_msk |= (1 << qid);
} else {
-   in->tid_disable_ampdu |= ~(1 << tid);
+   in->tid_disable_ampdu |= (1 << tid);
/* Queue remains enabled in the TFD queue mask. */
err = iwm_flush_sta(sc, in);
if (err)
@@ -6677,7 +6677,7 @@ iwm_add_sta_cmd(struct iwm_softc *sc, struct iwm_node 
qid = IWM_DQA_INJECT_MONITOR_QUEUE;
else
qid = IWM_AUX_QUEUE;
-   in->tfd_queue_msk |= htole32(1 << qid);
+   in->tfd_queue_msk |= (1 << qid);
} else {
int ac;
for (ac = 0; ac < EDCA_NUM_AC; ac++) {
@@ -6685,7 +6685,7 @@ iwm_add_sta_cmd(struct iwm_softc *sc, struct iwm_node 
if (isset(sc->sc_enabled_capa,
IWM_UCODE_TLV_CAPA_DQA_SUPPORT))
qid += IWM_DQA_MIN_MGMT_QUEUE;
-   in->tfd_queue_msk |= htole32(1 << qid);
+   in->tfd_queue_msk |= (1 << qid);
}
}
if (!update) {
@@ -8015,7 +8015,6 @@ iwm_auth(struct iwm_softc *sc)
sc->sc_flags |= IWM_FLAG_BINDING_ACTIVE;
 
in->tid_disable_ampdu = 0x;
-   in->tfd_queue_msk = 0;
err = iwm_add_sta_cmd(sc, in, 0);
if (err) {
printf("%s: could not add sta (error %d)\n",
@@ -8074,11 +8073,11 @@ iwm_deauth(struct iwm_softc *sc)
return err;
}
in->tid_disable_ampdu = 0x;
-   in->tfd_queue_msk = 0;
sc->sc_flags &= ~IWM_FLAG_STA_ACTIVE;
sc->sc_rx_ba_sessions = 0;
sc->ba_rx.start_tidmask = 0;
sc->ba_rx.stop_tidmask = 0;
+   sc->tx_ba_queue_mask = 0;
sc->ba_tx.start_tidmask = 0;
sc->ba_tx.stop_tidmask = 0;
}
@@ -8116,10 +8115,8 @@ iwm_assoc(struct iwm_softc *sc)
 
splassert(IPL_NET);
 
-   if (!update_sta) {
+   if (!update_sta)
in->tid_disable_ampdu = 0x;
-   in->tfd_queue_msk = 0;
-   }
err = iwm_add_sta_cmd(sc, in, update_sta);
if (err) {
printf("%s: could not %s STA (error %d)\n",
@@ -8150,11 +8147,11 @@ iwm_disassoc(struct iwm_softc *sc)
return err;
}
in->tid_disable_ampdu = 0x;
-   in->tfd_queue_msk = 0;
sc->sc_flags &= ~IWM_FLAG_STA_ACTIVE;
sc->sc_rx_ba_sessions = 0;
sc->ba_rx.start_tidmask = 0;
sc->ba_rx.stop_tidmask = 0;
+   sc->tx_ba_queue_mask = 0;
sc->ba_tx.start_tidmask = 0;
sc->ba_tx.stop_tidmask = 0;
}
@@ -9520,6 +9517,8 @@ iwm_stop(struct ifnet *ifp)
ifq_clr_oactive(>if_snd);
 
in->in_phyctxt = NULL;
+   in->tid_disable_ampdu = 0x;
+   in->tfd_queue_msk = 0;
 
sc->sc_flags &= ~(IWM_FLAG_SCANNING | IWM_FLAG_BGSCAN);
sc->sc_flags &= ~IWM_FLAG_MAC_ACTIVE;



  1   2   3   4   5   >