Re: ibm_newemac tx problem with jumbo frame enabled
On Thu, Dec 8, 2011 at 3:33 AM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Wed, 2011-12-07 at 13:35 +0530, Prashant Bhole wrote: Still couldn't find anything like fifo overflow... I noticed one more thing, this problem happens only when mtu size on the initiator (the other end) is set to 4088, regardless of any mtu size set for EMAC. Did you check all the registers that may carry errors ? Nothing showed up ? Did you check that things like Pause frames were properly negociated on both sides ? Tried playing with the pause and FIFO thresholds ? Other than using the tx timeout to perform resets I don't see a good way to fix that problem. Cheers, Ben. I checked RX descriptor status and TX descriptor status and ethtool output. However I don't know about pause packet/frame, how do I check if pause frames are properly negotiated on both sides? I need to try changing pause and FIFO thresholds. ethtool output after disconnection is as follows: # ethtool -S eth0 NIC statistics: rx_packets: 330939 rx_bytes: 804963241 tx_packets: 248554 tx_bytes: 798853638 rx_packets_csum: 330716 tx_packets_csum: 179526 tx_undo: 0 rx_dropped_stack: 0 rx_dropped_oom: 0 rx_dropped_error: 0 rx_dropped_resize: 0 rx_dropped_mtu: 0 rx_stopped: 0 rx_bd_errors: 0 rx_bd_overrun: 0 rx_bd_bad_packet: 0 rx_bd_runt_packet: 0 rx_bd_short_event: 0 rx_bd_alignment_error: 0 rx_bd_bad_fcs: 0 rx_bd_packet_too_long: 0 rx_bd_out_of_range: 0 rx_bd_in_range: 0 rx_parity: 0 rx_fifo_overrun: 0 rx_overrun: 0 rx_bad_packet: 0 rx_runt_packet: 0 rx_short_event: 0 rx_alignment_error: 0 rx_bad_fcs: 0 rx_packet_too_long: 0 rx_out_of_range: 0 rx_in_range: 0 tx_dropped: 0 tx_bd_errors: 0 tx_bd_bad_fcs: 0 tx_bd_carrier_loss: 0 tx_bd_excessive_deferral: 0 tx_bd_excessive_collisions: 0 tx_bd_late_collision: 0 tx_bd_multple_collisions: 0 tx_bd_single_collision: 0 tx_bd_underrun: 0 tx_bd_sqe: 0 tx_parity: 0 tx_underrun: 0 tx_sqe: 0 tx_errors: 0 Thanks, Prashant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ibm_newemac tx problem with jumbo frame enabled
On Thu, 2011-12-08 at 18:31 +0530, Prashant Bhole wrote: I checked RX descriptor status and TX descriptor status and ethtool output. However I don't know about pause packet/frame, how do I check if pause frames are properly negotiated on both sides? I need to try changing pause and FIFO thresholds. ethtool output after disconnection is as follows: # ethtool -S eth0 NIC statistics: rx_packets: 330939 rx_bytes: 804963241 tx_packets: 248554 tx_bytes: 798853638 rx_packets_csum: 330716 tx_packets_csum: 179526 tx_undo: 0 .../... Ok so none of the error counters seem to trip, odd. No idea what's up, you may want to ask the folks at APM (CCed Tirumala). I wonder also if we are properly enabling the reporting of error interrupts... if we got that wrong we may never detect FIFO overruns. What you describe really looks like a fifo overrun to me. Additionally, look at emac_configure(), sees how it configures the pause packet thresholds, maybe you can tweak the watermark to be more aggressive. Also check that pause is actually enabled (with ethtool) and that the PHY negociated it properly (that the link partner supports pause frames). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
RE: ibm_newemac tx problem with jumbo frame enabled
Hi Ben, -Original Message- From: Benjamin Herrenschmidt [mailto:b...@kernel.crashing.org] Sent: Thursday, December 08, 2011 2:59 PM To: Prashant Bhole Cc: linuxppc-...@ozlabs.org; Tirumala Marri Subject: Re: ibm_newemac tx problem with jumbo frame enabled On Thu, 2011-12-08 at 18:31 +0530, Prashant Bhole wrote: I checked RX descriptor status and TX descriptor status and ethtool output. However I don't know about pause packet/frame, how do I check if pause frames are properly negotiated on both sides? I need to try changing pause and FIFO thresholds. ethtool output after disconnection is as follows: # ethtool -S eth0 NIC statistics: rx_packets: 330939 rx_bytes: 804963241 tx_packets: 248554 tx_bytes: 798853638 rx_packets_csum: 330716 tx_packets_csum: 179526 tx_undo: 0 .../... Ok so none of the error counters seem to trip, odd. No idea what's up, you may want to ask the folks at APM (CCed Tirumala). I wonder also if we are properly enabling the reporting of error interrupts... if we got that wrong we may never detect FIFO overruns. What you describe really looks like a fifo overrun to me. Additionally, look at emac_configure(), sees how it configures the pause packet thresholds, maybe you can tweak the watermark to be more aggressive. Also check that pause is actually enabled (with ethtool) and that the PHY negociated it properly (that the link partner supports pause frames). I will take a look. Thx, Marri ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ibm_newemac tx problem with jumbo frame enabled
On Fri, Nov 25, 2011 at 10:55 AM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Fri, 2011-11-18 at 10:33 +0530, Prashant Bhole wrote: Hi, I have been facing problem with ibm_newemac driver (v3.54). The board gets disconnected and can not be pinged in between some heavy network traffic. In my case I am running IOmeter All-in-One 8 threads on the iSCSI target. MTU is 4088. I found that after executing emac_full_tx_reset(), the board can be pinged again. Again after some heavy traffic of 5-6 seconds, traffic stops. This can be repeated after full tx reset. Is this a known issue? what could cause this? Any pointers would be greatly appreciated. Not that I know of. Can you check if any of the error reporting registers trip anything ? Could it just be a fifo overflow which we may not be handling properly in the driver ? Cheers, Ben. Still couldn't find anything like fifo overflow... I noticed one more thing, this problem happens only when mtu size on the initiator (the other end) is set to 4088, regardless of any mtu size set for EMAC. - Prashant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ibm_newemac tx problem with jumbo frame enabled
On Wed, 2011-12-07 at 13:35 +0530, Prashant Bhole wrote: Still couldn't find anything like fifo overflow... I noticed one more thing, this problem happens only when mtu size on the initiator (the other end) is set to 4088, regardless of any mtu size set for EMAC. Did you check all the registers that may carry errors ? Nothing showed up ? Did you check that things like Pause frames were properly negociated on both sides ? Tried playing with the pause and FIFO thresholds ? Other than using the tx timeout to perform resets I don't see a good way to fix that problem. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: ibm_newemac tx problem with jumbo frame enabled
On Fri, 2011-11-18 at 10:33 +0530, Prashant Bhole wrote: Hi, I have been facing problem with ibm_newemac driver (v3.54). The board gets disconnected and can not be pinged in between some heavy network traffic. In my case I am running IOmeter All-in-One 8 threads on the iSCSI target. MTU is 4088. I found that after executing emac_full_tx_reset(), the board can be pinged again. Again after some heavy traffic of 5-6 seconds, traffic stops. This can be repeated after full tx reset. Is this a known issue? what could cause this? Any pointers would be greatly appreciated. Not that I know of. Can you check if any of the error reporting registers trip anything ? Could it just be a fifo overflow which we may not be handling properly in the driver ? Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
ibm_newemac tx problem with jumbo frame enabled
Hi, I have been facing problem with ibm_newemac driver (v3.54). The board gets disconnected and can not be pinged in between some heavy network traffic. In my case I am running IOmeter All-in-One 8 threads on the iSCSI target. MTU is 4088. I found that after executing emac_full_tx_reset(), the board can be pinged again. Again after some heavy traffic of 5-6 seconds, traffic stops. This can be repeated after full tx reset. Is this a known issue? what could cause this? Any pointers would be greatly appreciated. - Prashant ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev