Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
On 1/21/2016 9:47 AM, jordan_hargr...@dell.com wrote: >> From: Babu Moger [babu.mo...@oracle.com] >> Sent: Tuesday, January 19, 2016 2:39 PM >> To: Hargrave, Jordan; bhelg...@google.com >> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >> shane.seym...@hpe.com; myron.st...@gmail.com >> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >> >> Hi Jordan, >> >> On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote: >>> From: Babu Moger [babu.mo...@oracle.com] >>> Sent: Monday, January 11, 2016 4:49 PM >>> To: bhelg...@google.com >>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >>> shane.seym...@hpe.com; myron.st...@gmail.com; >>> venkatkumar.duvv...@avago.com; Hargrave, Jordan >>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >>> >>> Sorry. Missed Jordan. >>> >>> On 1/11/2016 3:13 PM, Babu Moger wrote: >>>> Reading or Writing of PCI VPD data causes system panic. >>>> We saw this problem by running "lspci -vvv" in the beginning. >>>> However this can be easily reproduced by running >>>> cat /sys/bus/devices/XX../vpd >>>> >>>> VPD length has been set as 32768 by default. Accessing vpd >>>> will trigger read/write of 32k. This causes problem as we >>>> could read data beyond the VPD end tag. Behaviour is un- >>>> predictable when this happens. I see some other adapter doing >>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk >>>> for Broadcom 5708S")) >>>> >>>> I see there is an attempt to fix this right way. >>>> https://patchwork.ozlabs.org/patch/534843/ or >>>> https://lkml.org/lkml/2015/10/23/97 >>>> >>>> Tried to fix it this way, but problem is I dont see the proper >>>> start/end TAGs(at least for this adapter) at all. The data is >>>> mostly junk or zeros. This patch fixes the issue by setting the >>>> vpd length to 0x80. >>>> >>>> Also look at the threds >>>> >>>> https://lkml.org/lkml/2015/11/10/557 >>>> https://lkml.org/lkml/2015/12/29/315 >>>> >>>> Signed-off-by: Babu Moger >>>> --- >>>> >>>> NOTE: >>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and >>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID. >>>> I felt it is too broad. Can you please check. >>>> >>> >>> I don't actually have that hardware, it was a bugfix for biosdevname for >>> RedHat. We were getting >>> 'BUG: soft lockup - CPU#0 stuck for 23s!' when attempting to read the vpd >>> area. >>> >>> Certainly 0x1969:0x1026 experienced this. >> >> Ok. Thanks. I will update the patch 4/4. >> > > Thanks! I also found 1969:2062. Maybe best to just block everything in > drivers/net/ethernet/atheros/ Ok. I will update the patch.. > > atl1c: > static const struct pci_device_id atl1c_pci_tbl[] = { > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)}, > /* required last entry */ > { 0 } > }; > > atl1e > static const struct pci_device_id atl1e_pci_tbl[] = { > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)}, > /* required last entry */ > { 0 } > }; > >>> >>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 >>> Gigabit or Fast Ethernet (rev b0) >>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or >>> Fast Ethernet >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >>> Stepping- SERR- FastB2B- DisINTx+ >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>> SERR- >> Latency: 0, Cache Line Size: 64 bytes >
RE: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>From: Babu Moger [babu.mo...@oracle.com] >Sent: Tuesday, January 19, 2016 2:39 PM >To: Hargrave, Jordan; bhelg...@google.com >Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >shane.seym...@hpe.com; myron.st...@gmail.com >Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices > >Hi Jordan, > >On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote: >> From: Babu Moger [babu.mo...@oracle.com] >> Sent: Monday, January 11, 2016 4:49 PM >> To: bhelg...@google.com >> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >> shane.seym...@hpe.com; myron.st...@gmail.com; venkatkumar.duvv...@avago.com; >> Hargrave, Jordan >> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >> >> Sorry. Missed Jordan. >> >> On 1/11/2016 3:13 PM, Babu Moger wrote: >>> Reading or Writing of PCI VPD data causes system panic. >>> We saw this problem by running "lspci -vvv" in the beginning. >>> However this can be easily reproduced by running >>> cat /sys/bus/devices/XX../vpd >>> >>> VPD length has been set as 32768 by default. Accessing vpd >>> will trigger read/write of 32k. This causes problem as we >>> could read data beyond the VPD end tag. Behaviour is un- >>> predictable when this happens. I see some other adapter doing >>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk >>> for Broadcom 5708S")) >>> >>> I see there is an attempt to fix this right way. >>> https://patchwork.ozlabs.org/patch/534843/ or >>> https://lkml.org/lkml/2015/10/23/97 >>> >>> Tried to fix it this way, but problem is I dont see the proper >>> start/end TAGs(at least for this adapter) at all. The data is >>> mostly junk or zeros. This patch fixes the issue by setting the >>> vpd length to 0x80. >>> >>> Also look at the threds >>> >>> https://lkml.org/lkml/2015/11/10/557 >>> https://lkml.org/lkml/2015/12/29/315 >>> >>> Signed-off-by: Babu Moger >>> --- >>> >>> NOTE: >>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and >>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID. >>> I felt it is too broad. Can you please check. >>> >> >> I don't actually have that hardware, it was a bugfix for biosdevname for >> RedHat. We were getting >> 'BUG: soft lockup - CPU#0 stuck for 23s!' when attempting to read the vpd >> area. >> >> Certainly 0x1969:0x1026 experienced this. > >Ok. Thanks. I will update the patch 4/4. > Thanks! I also found 1969:2062. Maybe best to just block everything in drivers/net/ethernet/atheros/ atl1c: static const struct pci_device_id atl1c_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)}, /* required last entry */ { 0 } }; atl1e static const struct pci_device_id atl1e_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)}, /* required last entry */ { 0 } }; >> >> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 >> Gigabit or Fast Ethernet (rev b0) >> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or >> Fast Ethernet >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >> Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 46 >> Region 0: Memory at c030 (64-bit, non-prefetchable) [size=256K] >> Region 2: I/O ports at 3000 [size=128] >> Capabilities: [40] Power Management version 2 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >> PME(D0-,D1-,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>
RE: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>From: Babu Moger [babu.mo...@oracle.com] >Sent: Tuesday, January 19, 2016 2:39 PM >To: Hargrave, Jordan; bhelg...@google.com >Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >shane.seym...@hpe.com; myron.st...@gmail.com >Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices > >Hi Jordan, > >On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote: >> From: Babu Moger [babu.mo...@oracle.com] >> Sent: Monday, January 11, 2016 4:49 PM >> To: bhelg...@google.com >> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >> shane.seym...@hpe.com; myron.st...@gmail.com; venkatkumar.duvv...@avago.com; >> Hargrave, Jordan >> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >> >> Sorry. Missed Jordan. >> >> On 1/11/2016 3:13 PM, Babu Moger wrote: >>> Reading or Writing of PCI VPD data causes system panic. >>> We saw this problem by running "lspci -vvv" in the beginning. >>> However this can be easily reproduced by running >>> cat /sys/bus/devices/XX../vpd >>> >>> VPD length has been set as 32768 by default. Accessing vpd >>> will trigger read/write of 32k. This causes problem as we >>> could read data beyond the VPD end tag. Behaviour is un- >>> predictable when this happens. I see some other adapter doing >>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk >>> for Broadcom 5708S")) >>> >>> I see there is an attempt to fix this right way. >>> https://patchwork.ozlabs.org/patch/534843/ or >>> https://lkml.org/lkml/2015/10/23/97 >>> >>> Tried to fix it this way, but problem is I dont see the proper >>> start/end TAGs(at least for this adapter) at all. The data is >>> mostly junk or zeros. This patch fixes the issue by setting the >>> vpd length to 0x80. >>> >>> Also look at the threds >>> >>> https://lkml.org/lkml/2015/11/10/557 >>> https://lkml.org/lkml/2015/12/29/315 >>> >>> Signed-off-by: Babu Moger <babu.mo...@oracle.com> >>> --- >>> >>> NOTE: >>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and >>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID. >>> I felt it is too broad. Can you please check. >>> >> >> I don't actually have that hardware, it was a bugfix for biosdevname for >> RedHat. We were getting >> 'BUG: soft lockup - CPU#0 stuck for 23s!' when attempting to read the vpd >> area. >> >> Certainly 0x1969:0x1026 experienced this. > >Ok. Thanks. I will update the patch 4/4. > Thanks! I also found 1969:2062. Maybe best to just block everything in drivers/net/ethernet/atheros/ atl1c: static const struct pci_device_id atl1c_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)}, /* required last entry */ { 0 } }; atl1e static const struct pci_device_id atl1e_pci_tbl[] = { {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)}, {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)}, /* required last entry */ { 0 } }; >> >> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 >> Gigabit or Fast Ethernet (rev b0) >> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or >> Fast Ethernet >> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >> Stepping- SERR- FastB2B- DisINTx+ >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >> SERR- > Latency: 0, Cache Line Size: 64 bytes >> Interrupt: pin A routed to IRQ 46 >> Region 0: Memory at c030 (64-bit, non-prefetchable) [size=256K] >> Region 2: I/O ports at 3000 [size=128] >> Capabilities: [40] Power Management version 2 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA >> PME(D0-,D1-,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [48] MSI: Enable+ Count=1/1 Maskable-
Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
On 1/21/2016 9:47 AM, jordan_hargr...@dell.com wrote: >> From: Babu Moger [babu.mo...@oracle.com] >> Sent: Tuesday, January 19, 2016 2:39 PM >> To: Hargrave, Jordan; bhelg...@google.com >> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >> shane.seym...@hpe.com; myron.st...@gmail.com >> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >> >> Hi Jordan, >> >> On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote: >>> From: Babu Moger [babu.mo...@oracle.com] >>> Sent: Monday, January 11, 2016 4:49 PM >>> To: bhelg...@google.com >>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; >>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; >>> shane.seym...@hpe.com; myron.st...@gmail.com; >>> venkatkumar.duvv...@avago.com; Hargrave, Jordan >>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices >>> >>> Sorry. Missed Jordan. >>> >>> On 1/11/2016 3:13 PM, Babu Moger wrote: >>>> Reading or Writing of PCI VPD data causes system panic. >>>> We saw this problem by running "lspci -vvv" in the beginning. >>>> However this can be easily reproduced by running >>>> cat /sys/bus/devices/XX../vpd >>>> >>>> VPD length has been set as 32768 by default. Accessing vpd >>>> will trigger read/write of 32k. This causes problem as we >>>> could read data beyond the VPD end tag. Behaviour is un- >>>> predictable when this happens. I see some other adapter doing >>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk >>>> for Broadcom 5708S")) >>>> >>>> I see there is an attempt to fix this right way. >>>> https://patchwork.ozlabs.org/patch/534843/ or >>>> https://lkml.org/lkml/2015/10/23/97 >>>> >>>> Tried to fix it this way, but problem is I dont see the proper >>>> start/end TAGs(at least for this adapter) at all. The data is >>>> mostly junk or zeros. This patch fixes the issue by setting the >>>> vpd length to 0x80. >>>> >>>> Also look at the threds >>>> >>>> https://lkml.org/lkml/2015/11/10/557 >>>> https://lkml.org/lkml/2015/12/29/315 >>>> >>>> Signed-off-by: Babu Moger <babu.mo...@oracle.com> >>>> --- >>>> >>>> NOTE: >>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and >>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID. >>>> I felt it is too broad. Can you please check. >>>> >>> >>> I don't actually have that hardware, it was a bugfix for biosdevname for >>> RedHat. We were getting >>> 'BUG: soft lockup - CPU#0 stuck for 23s!' when attempting to read the vpd >>> area. >>> >>> Certainly 0x1969:0x1026 experienced this. >> >> Ok. Thanks. I will update the patch 4/4. >> > > Thanks! I also found 1969:2062. Maybe best to just block everything in > drivers/net/ethernet/atheros/ Ok. I will update the patch.. > > atl1c: > static const struct pci_device_id atl1c_pci_tbl[] = { > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)}, > /* required last entry */ > { 0 } > }; > > atl1e > static const struct pci_device_id atl1e_pci_tbl[] = { > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)}, > {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)}, > /* required last entry */ > { 0 } > }; > >>> >>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 >>> Gigabit or Fast Ethernet (rev b0) >>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or >>> Fast Ethernet >>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- >>> Stepping- SERR- FastB2B- DisINTx+ >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- >>> SERR- >> Latency: 0, Cache Line