Re: Trouble with re driver

2023-11-19 Thread Martin Husemann
On Sun, Nov 19, 2023 at 12:48:47PM +0100, BERTRAND Joël wrote:
>   I have done a mistake this morning. I have replaced /netbsd kernel and
> copied /netbsd into /netbsd.old (thus, I have deleted my -10-beta
> kernel) but I'm pretty sure my -10-beta was built less that three months
> ago.

I reviewed the CHANGES-10 entries from that period again and still nothing
plausible. All pcidevs changes were only cosmetic (name changes) or
additions and no PCI resource mapping relevant files have been changed.

Martin


Re: Trouble with re driver

2023-11-19 Thread BERTRAND Joël
Martin Husemann a écrit :
> On Sat, Nov 18, 2023 at 08:14:32PM +0100, BERTRAND Joël wrote:
>>  If I restart this server with a -10.0-Beta kernel, faulty ethernet
>> adapter is autoconfigured without trouble.
> 
> Can you give more details of that kernel? Ideally source update time,
> or kernel build time? That way we can narrow down the range of pullups
> in between the broken and the non-broken kernel.

I have done a mistake this morning. I have replaced /netbsd kernel and
copied /netbsd into /netbsd.old (thus, I have deleted my -10-beta
kernel) but I'm pretty sure my -10-beta was built less that three months
ago.

> There are no obvious changes at first glance, so this is a bit of a riddle.
> It would be best if you could bisect the breakage to an individual pullup
> (as we have no other reports of broken re(4) so far and as you noticed it
> seems to be pretty hardware dependend).

Hardware configuration: Asus motherboard with 16 GB and i7 4470
(chipset Z97).

legendre# lspci
00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM
Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core
Processor PCI Express x16 Controller (rev 06)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core
Processor PCI Express x8 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th
Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core
Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB
xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset
Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2)
I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB
EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio
Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI
Express Root Port 1 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d0)
00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI
Express Root Port 5 (rev d0)
00:1c.5 PCI bridge: Intel Corporation 9 Series Chipset Family PCI
Express Root Port 6 (rev d0)
00:1c.7 PCI bridge: Intel Corporation 9 Series Chipset Family PCI
Express Root Port 8 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB
EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA
Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network
Connection (rev 01)
04:00.0 PCI bridge: ASMedia Technology Inc. ASM1083/1085 PCIe to PCI
Bridge (rev 04)
05:00.0 Ethernet controller: D-Link System Inc DGE-528T Gigabit Ethernet
Adapter (rev 10)
06:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
07:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
08:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9120 SATA
6Gb/s Controller (rev 12)
08:00.1 IDE interface: Marvell Technology Group Ltd. 88SE912x IDE
Controller (rev 12)

I can revert some patches against -10 kernel, but I cannot do extensive
tests. When this server is down, all workstations on LAN side must be
down too...

Best regards,

JB



signature.asc
Description: OpenPGP digital signature


Re: Trouble with re driver

2023-11-19 Thread Martin Husemann
On Sat, Nov 18, 2023 at 10:59:14PM +0100, Tobias Nygren wrote:
> I suspect the regression originates with an acpica update and some
> firmware bug might be a prerequisite to trigger it.

There have been no acpica updates on the -10 branch.

Martin


Re: Trouble with re driver

2023-11-19 Thread Martin Husemann
On Sat, Nov 18, 2023 at 08:14:32PM +0100, BERTRAND Joël wrote:
>   If I restart this server with a -10.0-Beta kernel, faulty ethernet
> adapter is autoconfigured without trouble.

Can you give more details of that kernel? Ideally source update time,
or kernel build time? That way we can narrow down the range of pullups
in between the broken and the non-broken kernel.

There are no obvious changes at first glance, so this is a bit of a riddle.
It would be best if you could bisect the breakage to an individual pullup
(as we have no other reports of broken re(4) so far and as you noticed it
seems to be pretty hardware dependend).

Martin


Re: Trouble with re driver

2023-11-19 Thread BERTRAND Joël
Tobias Nygren a écrit :
> On Sat, 18 Nov 2023 20:14:32 +0100
> BERTRAND Joël  wrote:
> 
>>  Maybe something is broken in recent changes in
>> src/sys/dev/pci/pci_resource.c, pcidevs_data.h or pcidevs.h.
> 
> Since the attach function runs it does not seems to be a pcidevs problem.
> pci_resource.c is only used on ARM platforms. You can find the equivalent
> x86 code in pci_map.c. It has not been changed recently from what I can tell.
> 
>> pci_mem_find: expected mem type 0004, found 
> 
> This error means we expected a 64-bit mem range assigned to the card
> by the ACPI firmware but found a 32-bit range. It is a strange error
> to get in this context because the card according to the config space
> dump clearly only has a 32-bit BAR so it has to use 32-bit bus space.
> We can reach that situation if the re driver incorrectly tried to map
> the BAR with PCI_MAPREG_MEM_TYPE_64BIT.
> 
> I suspect the regression originates with an acpica update and some
> firmware bug might be a prerequisite to trigger it.
> 
> You'll need to figure out why the expected mem type check fires.

Unfortunately, I cannot quickly test on this server and I don't have
other server in the same configuration. I will test as soon as possible,
but I have to poweroff all diskless workstations on LAN side (that run
complex simulations).

> A good place to start digging is to dissect this code and find
> out what the value retured by pci_mapreg_type is:
> https://github.com/NetBSD/src/blob/d7465f61f231e4328d26a5628c5ccb266f168f3a/sys/dev/pci/if_re_pci.c#L210
> 
> Since you mentioned you have lots of other network adapters it might
> also be that the system has ran out of 32-bit bus space and
> is attempting to use 64-bit as a last resort.

Strange. -10-beta ran fine in the same configuration: wm[0-4], re0,
bridge0, lagg0, npflog0.

Best regards,

JB



signature.asc
Description: OpenPGP digital signature


Re: Trouble with re driver

2023-11-18 Thread Tobias Nygren
On Sat, 18 Nov 2023 20:14:32 +0100
BERTRAND Joël  wrote:

>   Maybe something is broken in recent changes in
> src/sys/dev/pci/pci_resource.c, pcidevs_data.h or pcidevs.h.

Since the attach function runs it does not seems to be a pcidevs problem.
pci_resource.c is only used on ARM platforms. You can find the equivalent
x86 code in pci_map.c. It has not been changed recently from what I can tell.

> pci_mem_find: expected mem type 0004, found 

This error means we expected a 64-bit mem range assigned to the card
by the ACPI firmware but found a 32-bit range. It is a strange error
to get in this context because the card according to the config space
dump clearly only has a 32-bit BAR so it has to use 32-bit bus space.
We can reach that situation if the re driver incorrectly tried to map
the BAR with PCI_MAPREG_MEM_TYPE_64BIT.

I suspect the regression originates with an acpica update and some
firmware bug might be a prerequisite to trigger it.

You'll need to figure out why the expected mem type check fires.
A good place to start digging is to dissect this code and find
out what the value retured by pci_mapreg_type is:
https://github.com/NetBSD/src/blob/d7465f61f231e4328d26a5628c5ccb266f168f3a/sys/dev/pci/if_re_pci.c#L210

Since you mentioned you have lots of other network adapters it might
also be that the system has ran out of 32-bit bus space and
is attempting to use 64-bit as a last resort.


Trouble with re driver

2023-11-18 Thread BERTRAND Joël
Hello,

I have a server with a lot of wm interfaces and one re.

With last -10.0-RC1 kernel, this ethernet adapter is not configured and
dmesg contains :

ov 18 16:28:08 legendre /netbsd: [   1.0066881] pci5 at ppb4 bus 5
Nov 18 16:28:08 legendre /netbsd: [   1.0066881] pci5: i/o space, memory
space enabled, rd/line, wr/inv ok
Nov 18 16:28:08 legendre /netbsd: [   1.0066881] re0 at pci5 dev 0
function 0pci_mem_find: expected mem type 0004, found 
Nov 18 16:28:08 legendre /netbsd: [   1.0066881] autoconfiguration
error: : can't map registers

Of course, adapter doesn't run as expected. This adapter is a Netgear
PCI device with a RTL8169S-32. NetBSD-10.0-Beta ran fine with this
adapter. I have just replaced whit adapter by another one with the same
chipset and it runs as expected :

Nov 18 19:53:32 legendre /netbsd: [   1.0019962] pci5 at ppb4 bus 5
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] pci5: i/o space, memory
space enabled, rd/line, wr/inv ok
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] re0 at pci5 dev 0
function 0: D-Link DGE-528T Gigabit Ethernet (rev. 0x10)
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] re0: interrupting at
ioapic0 pin 19
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] re0: RTL8169S (0x0400)
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] re0: Ethernet address
00:13:46:3a:b3:0a
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] re0: using 256 tx
descriptors
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] rgephy0 at re0 phy 7:
RTL8211 1000BASE-T media interface
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] rgephy0: 10baseT,
10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
Nov 18 19:53:32 legendre /netbsd: [   1.0019962] ppb5 at pci0 dev 28
function 4: Intel 9 Series PCIe (rev. 0xd0)

If I restart this server with a -10.0-Beta kernel, faulty ethernet
adapter is autoconfigured without trouble.

lspci -vvv returns on adapter that doesn't run:
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8169 PCI
Gigabit Ethernet Controller (rev 10)
Subsystem: Netgear GA311
Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
SERR-  [disabled]
Expansion ROM at fffe [disabled]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-


Same command on adapter that runs:

05:00.0 Ethernet controller: D-Link System Inc DGE-528T Gigabit Ethernet
Adapter (rev 10)
Subsystem: D-Link System Inc DGE-528T PCI Gigabit Ethernet Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
SERR- 

signature.asc
Description: OpenPGP digital signature