W dniu 31.10.2018 o 22:05, Saeed Mahameed pisze:
On Tue, 2018-10-30 at 10:32 -0700, Cong Wang wrote:
On Tue, Oct 30, 2018 at 7:16 AM Eric Dumazet <eric.duma...@gmail.com>
wrote:


On 10/30/2018 01:09 AM, Paweł Staszewski wrote:

W dniu 30.10.2018 o 08:29, Eric Dumazet pisze:
On 10/29/2018 11:09 PM, Dimitris Michailidis wrote:

Indeed this is a bug. I would expect it to produce frequent
errors
though as many odd-length
packets would trigger it. Do you have RXFCS? Regardless, how
frequently do you see the problem?

Old kernels (before 88078d98d1bb) were simply resetting
ip_summed to CHECKSUM_NONE

And before your fix (commit d55bef5059dd057bd), mlx5 bug was
canceling the bug you fixed.

So we now need to also fix mlx5.

And of course use skb_header_pointer() in mlx5e_get_fcs() as I
mentioned earlier,
plus __get_unaligned_cpu32() as you hinted.




No RXFCS

Same with Pawel, RXFCS is disabled by default.


And this trace is rly frequently like once per 3/4 seconds
like below:
[28965.776864] vlan1490: hw csum failure
Might be vlan related.
Hi Pawel, is the vlan stripping offload disabled or enabled in your
case ?

To verify:
ethtool -k <interface> | grep rx-vlan-offload
rx-vlan-offload: on
To set:
ethtool -K <interface> rxvlan on/off
Enabled:
ethtool -k enp175s0f0
Features for enp175s0f0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: on
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]



if the vlan offload is off then it will trigger the mlx5e vlan csum
adjustment code pointed out by Eric.

Anyhow, it should work in both cases, but i am trying to narrow down
the possibilities.

Also could it be a double tagged packet ?
no double tagged packets there




Unlike Pawel's case, we don't use vlan at all, maybe this is why we
see
it much less frequently than Pawel.

Also, it is probably not specific to mlx5, as there is another report
which
is probably a non-mlx5 driver.

Cong, How often does this happen ? can you some how verify if the
problematic packet has extra end padding after the ip payload ?

It would be cool if we had a feature in kernel to store such SKB in
memory when such issue occurs, and let the user dump it later (via
tcpdump) and send the dump to the vendor for debug so we could just
replay and see what happens.

Thanks.

Reply via email to