Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
Hello Manfred, On 22.06.2015 11:48, Manfred Schlaegl wrote: Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on your machine? /proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default) I tried to dig a little deeper in timestamping: 1. (net/core/dev.c) I found that static_key_false(netstamp_needed) is always 0, resulting that the timestamp is never set by net_timestamp_check in netif_receive_skb_internal. 2. (net/core/dev.c) static_key_false(netstamp_needed) is 0 because net_enable_timestamp is never called. 3. (net/core/sock.c) net_enable_timestamp is never called because SK_FLAGS_TIMESTAMP is not set 4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set 5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not set because timestamping is an optional feature (according to http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345) not enabled in my use case (even if netdev_tstamp_prequeue is set to 1) So the original assumption for the was correct: The correctness of the skb equality check depends on a feature that is not enabled by default (respectively user configurable). Do you agree with this? Yes. But the point becomes an issue when there's no userspace application that requires timestamps. I did my testing wile having at least one candump instances running, which enables timestamping. So when there's no one requesting timestamps the check in can_rcv does not perform properly. Therefor my patch grabs your idea to set the timestamps for CAN skbs unconditionally. But there were some more places in the code where we need to take care about that. Regards, Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
Hello Oliver, On 2015-06-21 00:42, Oliver Hartkopp wrote: 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality check. Since the sk_buff pointer is not sufficient to do this (buffers are reused), the check also compares time stamps. In short: pointer+time stamp was assumed as unique key to a specific frame. The problem with this is, that the time stamp is an optional property and not set per default. In our case (flexcan) the time stamp is always zero, so the equality check is reduced to equality of buffer pointers, resulting in a lot of dropped frames. The question is why your system did not generate a timestamp at the time of skb reception. Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the following reception process. flexcan.c only uses netif_receive_skb() - but all theses functions set the timestamp net_timestamp_check(netdev_tstamp_prequeue, skb); depending on netdev_tstamp_prequeue which is configured by /proc/sys/net/core/netdev_tstamp_prequeue See the idea of netdev_tstamp_prequeue here: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771 Thank you for the background information! I've also noticed your patch [PATCH - regression 4.1-rc8] can: fix loss of CAN frames in raw_rcv Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on your machine? /proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default) I tried to dig a little deeper in timestamping: 1. (net/core/dev.c) I found that static_key_false(netstamp_needed) is always 0, resulting that the timestamp is never set by net_timestamp_check in netif_receive_skb_internal. 2. (net/core/dev.c) static_key_false(netstamp_needed) is 0 because net_enable_timestamp is never called. 3. (net/core/sock.c) net_enable_timestamp is never called because SK_FLAGS_TIMESTAMP is not set 4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set 5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not set because timestamping is an optional feature (according to http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345) not enabled in my use case (even if netdev_tstamp_prequeue is set to 1) So the original assumption for the was correct: The correctness of the skb equality check depends on a feature that is not enabled by default (respectively user configurable). Do you agree with this? Thanks again for your investigation! Sure! Best regards, Manfred -- To unsubscribe from this list: send the line unsubscribe netdev in
[PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
I've detected a massive loss of can frames on i.MX6 using flexcan driver with 4.1-rc8 and tracked this down to following commit: 514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery of a single CAN frame for overlapping CAN filters 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality check. Since the sk_buff pointer is not sufficient to do this (buffers are reused), the check also compares time stamps. In short: pointer+time stamp was assumed as unique key to a specific frame. The problem with this is, that the time stamp is an optional property and not set per default. In our case (flexcan) the time stamp is always zero, so the equality check is reduced to equality of buffer pointers, resulting in a lot of dropped frames. Possible solutions I thought of: 1. Every driver has to set a time stamp (possibly error prone and hard to enforce?) 2. Change the equality check 3. Fulfil the requirements of the equality check by setting a time stamp per default. This patch fixes the problem with solution 3. A time stamp is set at time of allocation in alloc_can_skb. The time stamp may be overridden later, but the function of the equality check is ensured. I'm not really deep in linux network subsystem, so there may exists more elegant solutions for the problem. Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at --- drivers/net/can/dev.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index b0f6924..282e2e7 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf) if (unlikely(!skb)) return NULL; + __net_timestamp(skb); skb-protocol = htons(ETH_P_CAN); skb-pkt_type = PACKET_BROADCAST; skb-ip_summed = CHECKSUM_UNNECESSARY; -- 1.7.10.4 signature.asc Description: OpenPGP digital signature
[PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
I've detected a massive loss of can frames on i.MX6 using flexcan driver with 4.1-rc8 and tracked this down to following commit: 514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery of a single CAN frame for overlapping CAN filters 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality check. Since the sk_buff pointer is not sufficient to do this (buffers are reused), the check also compares time stamps. In short: pointer+time stamp was assumed as unique key to a specific frame. The problem with this is, that the time stamp is an optional property and not set per default. In our case (flexcan) the time stamp is always zero, so the equality check is reduced to equality of buffer pointers, resulting in a lot of dropped frames. Possible solutions I thought of: 1. Every driver has to set a time stamp (possibly error prone and hard to enforce?) 2. Change the equality check 3. Fulfil the requirements of the equality check by setting a time stamp per default. This patch fixes the problem with solution 3. A time stamp is set at time of allocation in alloc_can_skb. The time stamp may be overridden later, but the function of the equality check is ensured. I'm not really deep in linux network subsystem, so there may exists more elegant solutions for the problem. Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at --- drivers/net/can/dev.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index b0f6924..282e2e7 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf) if (unlikely(!skb)) return NULL; + __net_timestamp(skb); skb-protocol = htons(ETH_P_CAN); skb-pkt_type = PACKET_BROADCAST; skb-ip_summed = CHECKSUM_UNNECESSARY; -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv
Hello Manfred, On 06/20/2015 07:21 PM, Manfred Schlaegl wrote: I've detected a massive loss of can frames on i.MX6 using flexcan driver with 4.1-rc8 and tracked this down to following commit: 514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery of a single CAN frame for overlapping CAN filters thanks for detecting this issue! 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality check. Since the sk_buff pointer is not sufficient to do this (buffers are reused), the check also compares time stamps. In short: pointer+time stamp was assumed as unique key to a specific frame. The problem with this is, that the time stamp is an optional property and not set per default. In our case (flexcan) the time stamp is always zero, so the equality check is reduced to equality of buffer pointers, resulting in a lot of dropped frames. The question is why your system did not generate a timestamp at the time of skb reception. Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the following reception process. flexcan.c only uses netif_receive_skb() - but all theses functions set the timestamp net_timestamp_check(netdev_tstamp_prequeue, skb); depending on netdev_tstamp_prequeue which is configured by /proc/sys/net/core/netdev_tstamp_prequeue See the idea of netdev_tstamp_prequeue here: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771 Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on your machine? If it's not '1' can you set it to '1' for a test? Possible solutions I thought of: 1. Every driver has to set a time stamp (possibly error prone and hard to enforce?) 2. Change the equality check 3. Fulfil the requirements of the equality check by setting a time stamp per default. This patch fixes the problem with solution 3. A time stamp is set at time of allocation in alloc_can_skb. That's a feasible way if won't find a better way to make sure the timestamps are generally set before the skb is processed in the NET_RX softirq. The time stamp may be overridden later, but the function of the equality check is ensured. I'm not really deep in linux network subsystem, so there may exists more elegant solutions for the problem. Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at --- drivers/net/can/dev.c |1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c index b0f6924..282e2e7 100644 --- a/drivers/net/can/dev.c +++ b/drivers/net/can/dev.c @@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf) if (unlikely(!skb)) return NULL; + __net_timestamp(skb); skb-protocol = htons(ETH_P_CAN); skb-pkt_type = PACKET_BROADCAST; skb-ip_summed = CHECKSUM_UNNECESSARY; Please check the netdev_tstamp_prequeue value first. If we would need solution 3 the __net_timestamp(skb) should be placed in alloc_canfd_skb() too. Thanks again for your investigation! Best regards, Oliver -- To unsubscribe from this list: send the line unsubscribe netdev in