Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices

2016-01-21 Thread Babu Moger


On 1/21/2016 9:47 AM, jordan_hargr...@dell.com wrote:
>> From: Babu Moger [babu.mo...@oracle.com]
>> Sent: Tuesday, January 19, 2016 2:39 PM
>> To: Hargrave, Jordan; bhelg...@google.com
>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>> shane.seym...@hpe.com; myron.st...@gmail.com
>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>
>> Hi Jordan,
>>
>> On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote:
>>> From: Babu Moger [babu.mo...@oracle.com]
>>> Sent: Monday, January 11, 2016 4:49 PM
>>> To: bhelg...@google.com
>>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>>> shane.seym...@hpe.com; myron.st...@gmail.com; 
>>> venkatkumar.duvv...@avago.com; Hargrave, Jordan
>>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>>
>>> Sorry. Missed Jordan.
>>>
>>> On 1/11/2016 3:13 PM, Babu Moger wrote:
>>>> Reading or Writing of PCI VPD data causes system panic.
>>>> We saw this problem by running "lspci -vvv" in the beginning.
>>>> However this can be easily reproduced by running
>>>>  cat /sys/bus/devices/XX../vpd
>>>>
>>>> VPD length has been set as 32768 by default. Accessing vpd
>>>> will trigger read/write of 32k. This causes problem as we
>>>> could read data beyond the VPD end tag. Behaviour is un-
>>>> predictable when this happens. I see some other adapter doing
>>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk
>>>> for Broadcom 5708S"))
>>>>
>>>> I see there is an attempt to fix this right way.
>>>> https://patchwork.ozlabs.org/patch/534843/ or
>>>> https://lkml.org/lkml/2015/10/23/97
>>>>
>>>> Tried to fix it this way, but problem is I dont see the proper
>>>> start/end TAGs(at least for this adapter) at all. The data is
>>>> mostly junk or zeros. This patch fixes the issue by setting the
>>>> vpd length to 0x80.
>>>>
>>>> Also look at the threds
>>>>
>>>> https://lkml.org/lkml/2015/11/10/557
>>>> https://lkml.org/lkml/2015/12/29/315
>>>>
>>>> Signed-off-by: Babu Moger 
>>>> ---
>>>>
>>>> NOTE:
>>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and
>>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID.
>>>> I felt it is too broad. Can you please check.
>>>>
>>>
>>> I don't actually have that hardware, it was a bugfix for biosdevname for 
>>> RedHat.  We were getting
>>> 'BUG: soft lockup - CPU#0 stuck for 23s!'  when attempting to read the vpd 
>>> area.
>>>
>>> Certainly 0x1969:0x1026 experienced this.
>>
>> Ok. Thanks. I will update the patch 4/4.
>>
> 
> Thanks! I also found 1969:2062. Maybe best to just block everything in 
> drivers/net/ethernet/atheros/

Ok. I will update the patch..


> 
> atl1c:
>  static const struct pci_device_id atl1c_pci_tbl[] = {
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)},
>  /* required last entry */
>  { 0 }
> };
> 
> atl1e
>  static const struct pci_device_id atl1e_pci_tbl[] = {
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)},
>  /* required last entry */
>  { 0 }
>  };
> 
>>>
>>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 
>>> Gigabit or Fast Ethernet (rev b0)
>>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or 
>>> Fast Ethernet
>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>>> Stepping- SERR- FastB2B- DisINTx+
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>>> SERR- >> Latency: 0, Cache Line Size: 64 bytes
>

RE: [PATCH RFC] pci: Blacklist vpd access for buggy devices

2016-01-21 Thread Jordan_Hargrave
>From: Babu Moger [babu.mo...@oracle.com]
>Sent: Tuesday, January 19, 2016 2:39 PM
>To: Hargrave, Jordan; bhelg...@google.com
>Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>shane.seym...@hpe.com; myron.st...@gmail.com
>Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>
>Hi Jordan,
>
>On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote:
>> From: Babu Moger [babu.mo...@oracle.com]
>> Sent: Monday, January 11, 2016 4:49 PM
>> To: bhelg...@google.com
>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>> shane.seym...@hpe.com; myron.st...@gmail.com; venkatkumar.duvv...@avago.com; 
>> Hargrave, Jordan
>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>
>> Sorry. Missed Jordan.
>>
>> On 1/11/2016 3:13 PM, Babu Moger wrote:
>>> Reading or Writing of PCI VPD data causes system panic.
>>> We saw this problem by running "lspci -vvv" in the beginning.
>>> However this can be easily reproduced by running
>>>  cat /sys/bus/devices/XX../vpd
>>>
>>> VPD length has been set as 32768 by default. Accessing vpd
>>> will trigger read/write of 32k. This causes problem as we
>>> could read data beyond the VPD end tag. Behaviour is un-
>>> predictable when this happens. I see some other adapter doing
>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk
>>> for Broadcom 5708S"))
>>>
>>> I see there is an attempt to fix this right way.
>>> https://patchwork.ozlabs.org/patch/534843/ or
>>> https://lkml.org/lkml/2015/10/23/97
>>>
>>> Tried to fix it this way, but problem is I dont see the proper
>>> start/end TAGs(at least for this adapter) at all. The data is
>>> mostly junk or zeros. This patch fixes the issue by setting the
>>> vpd length to 0x80.
>>>
>>> Also look at the threds
>>>
>>> https://lkml.org/lkml/2015/11/10/557
>>> https://lkml.org/lkml/2015/12/29/315
>>>
>>> Signed-off-by: Babu Moger 
>>> ---
>>>
>>> NOTE:
>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and
>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID.
>>> I felt it is too broad. Can you please check.
>>>
>>
>> I don't actually have that hardware, it was a bugfix for biosdevname for 
>> RedHat.  We were getting
>> 'BUG: soft lockup - CPU#0 stuck for 23s!'  when attempting to read the vpd 
>> area.
>>
>> Certainly 0x1969:0x1026 experienced this.
>
>Ok. Thanks. I will update the patch 4/4.
>

Thanks! I also found 1969:2062. Maybe best to just block everything in 
drivers/net/ethernet/atheros/

atl1c:
 static const struct pci_device_id atl1c_pci_tbl[] = {
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)},
 /* required last entry */
 { 0 }
};

atl1e
 static const struct pci_device_id atl1e_pci_tbl[] = {
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)},
 /* required last entry */
 { 0 }
 };

>>
>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 
>> Gigabit or Fast Ethernet (rev b0)
>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or 
>> Fast Ethernet
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>> Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>> SERR- > Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 46
>> Region 0: Memory at c030 (64-bit, non-prefetchable) [size=256K]
>> Region 2: I/O ports at 3000 [size=128]
>> Capabilities: [40] Power Management version 2
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
>> PME(D0-,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [48] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> 

RE: [PATCH RFC] pci: Blacklist vpd access for buggy devices

2016-01-21 Thread Jordan_Hargrave
>From: Babu Moger [babu.mo...@oracle.com]
>Sent: Tuesday, January 19, 2016 2:39 PM
>To: Hargrave, Jordan; bhelg...@google.com
>Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>shane.seym...@hpe.com; myron.st...@gmail.com
>Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>
>Hi Jordan,
>
>On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote:
>> From: Babu Moger [babu.mo...@oracle.com]
>> Sent: Monday, January 11, 2016 4:49 PM
>> To: bhelg...@google.com
>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>> shane.seym...@hpe.com; myron.st...@gmail.com; venkatkumar.duvv...@avago.com; 
>> Hargrave, Jordan
>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>
>> Sorry. Missed Jordan.
>>
>> On 1/11/2016 3:13 PM, Babu Moger wrote:
>>> Reading or Writing of PCI VPD data causes system panic.
>>> We saw this problem by running "lspci -vvv" in the beginning.
>>> However this can be easily reproduced by running
>>>  cat /sys/bus/devices/XX../vpd
>>>
>>> VPD length has been set as 32768 by default. Accessing vpd
>>> will trigger read/write of 32k. This causes problem as we
>>> could read data beyond the VPD end tag. Behaviour is un-
>>> predictable when this happens. I see some other adapter doing
>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk
>>> for Broadcom 5708S"))
>>>
>>> I see there is an attempt to fix this right way.
>>> https://patchwork.ozlabs.org/patch/534843/ or
>>> https://lkml.org/lkml/2015/10/23/97
>>>
>>> Tried to fix it this way, but problem is I dont see the proper
>>> start/end TAGs(at least for this adapter) at all. The data is
>>> mostly junk or zeros. This patch fixes the issue by setting the
>>> vpd length to 0x80.
>>>
>>> Also look at the threds
>>>
>>> https://lkml.org/lkml/2015/11/10/557
>>> https://lkml.org/lkml/2015/12/29/315
>>>
>>> Signed-off-by: Babu Moger <babu.mo...@oracle.com>
>>> ---
>>>
>>> NOTE:
>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and
>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID.
>>> I felt it is too broad. Can you please check.
>>>
>>
>> I don't actually have that hardware, it was a bugfix for biosdevname for 
>> RedHat.  We were getting
>> 'BUG: soft lockup - CPU#0 stuck for 23s!'  when attempting to read the vpd 
>> area.
>>
>> Certainly 0x1969:0x1026 experienced this.
>
>Ok. Thanks. I will update the patch 4/4.
>

Thanks! I also found 1969:2062. Maybe best to just block everything in 
drivers/net/ethernet/atheros/

atl1c:
 static const struct pci_device_id atl1c_pci_tbl[] = {
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)},
 /* required last entry */
 { 0 }
};

atl1e
 static const struct pci_device_id atl1e_pci_tbl[] = {
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)},
 {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)},
 /* required last entry */
 { 0 }
 };

>>
>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 
>> Gigabit or Fast Ethernet (rev b0)
>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or 
>> Fast Ethernet
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>> Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>> SERR- > Latency: 0, Cache Line Size: 64 bytes
>> Interrupt: pin A routed to IRQ 46
>> Region 0: Memory at c030 (64-bit, non-prefetchable) [size=256K]
>> Region 2: I/O ports at 3000 [size=128]
>> Capabilities: [40] Power Management version 2
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
>> PME(D0-,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [48] MSI: Enable+ Count=1/1 Maskable-

Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices

2016-01-21 Thread Babu Moger


On 1/21/2016 9:47 AM, jordan_hargr...@dell.com wrote:
>> From: Babu Moger [babu.mo...@oracle.com]
>> Sent: Tuesday, January 19, 2016 2:39 PM
>> To: Hargrave, Jordan; bhelg...@google.com
>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>> shane.seym...@hpe.com; myron.st...@gmail.com
>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>
>> Hi Jordan,
>>
>> On 1/19/2016 9:22 AM, jordan_hargr...@dell.com wrote:
>>> From: Babu Moger [babu.mo...@oracle.com]
>>> Sent: Monday, January 11, 2016 4:49 PM
>>> To: bhelg...@google.com
>>> Cc: linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; 
>>> alexander.du...@gmail.com; h...@suse.de; mkube...@suse.com; 
>>> shane.seym...@hpe.com; myron.st...@gmail.com; 
>>> venkatkumar.duvv...@avago.com; Hargrave, Jordan
>>> Subject: Re: [PATCH RFC] pci: Blacklist vpd access for buggy devices
>>>
>>> Sorry. Missed Jordan.
>>>
>>> On 1/11/2016 3:13 PM, Babu Moger wrote:
>>>> Reading or Writing of PCI VPD data causes system panic.
>>>> We saw this problem by running "lspci -vvv" in the beginning.
>>>> However this can be easily reproduced by running
>>>>  cat /sys/bus/devices/XX../vpd
>>>>
>>>> VPD length has been set as 32768 by default. Accessing vpd
>>>> will trigger read/write of 32k. This causes problem as we
>>>> could read data beyond the VPD end tag. Behaviour is un-
>>>> predictable when this happens. I see some other adapter doing
>>>> similar quirks(commit bffadffd43d4 ("PCI: fix VPD limit quirk
>>>> for Broadcom 5708S"))
>>>>
>>>> I see there is an attempt to fix this right way.
>>>> https://patchwork.ozlabs.org/patch/534843/ or
>>>> https://lkml.org/lkml/2015/10/23/97
>>>>
>>>> Tried to fix it this way, but problem is I dont see the proper
>>>> start/end TAGs(at least for this adapter) at all. The data is
>>>> mostly junk or zeros. This patch fixes the issue by setting the
>>>> vpd length to 0x80.
>>>>
>>>> Also look at the threds
>>>>
>>>> https://lkml.org/lkml/2015/11/10/557
>>>> https://lkml.org/lkml/2015/12/29/315
>>>>
>>>> Signed-off-by: Babu Moger <babu.mo...@oracle.com>
>>>> ---
>>>>
>>>> NOTE:
>>>> Jordan, Are you sure all the devices in PCI_VENDOR_ID_ATHEROS and
>>>> PCI_VENDOR_ID_ATTANSIC have this problem. You have used PCI_ANY_ID.
>>>> I felt it is too broad. Can you please check.
>>>>
>>>
>>> I don't actually have that hardware, it was a bugfix for biosdevname for 
>>> RedHat.  We were getting
>>> 'BUG: soft lockup - CPU#0 stuck for 23s!'  when attempting to read the vpd 
>>> area.
>>>
>>> Certainly 0x1969:0x1026 experienced this.
>>
>> Ok. Thanks. I will update the patch 4/4.
>>
> 
> Thanks! I also found 1969:2062. Maybe best to just block everything in 
> drivers/net/ethernet/atheros/

Ok. I will update the patch..


> 
> atl1c:
>  static const struct pci_device_id atl1c_pci_tbl[] = {
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1C)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L2C)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L2C_B2)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATHEROS_L1D_2_0)},
>  /* required last entry */
>  { 0 }
> };
> 
> atl1e
>  static const struct pci_device_id atl1e_pci_tbl[] = {
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, PCI_DEVICE_ID_ATTANSIC_L1E)},
>  {PCI_DEVICE(PCI_VENDOR_ID_ATTANSIC, 0x1066)},
>  /* required last entry */
>  { 0 }
>  };
> 
>>>
>>> 09:00.0 Ethernet controller: Atheros Communications AR8121/AR8113/AR8114 
>>> Gigabit or Fast Ethernet (rev b0)
>>> Subsystem: Atheros Communications AR8121/AR8113/AR8114 Gigabit or 
>>> Fast Ethernet
>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
>>> Stepping- SERR- FastB2B- DisINTx+
>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
>>> SERR- >> Latency: 0, Cache Line