Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv

2015-06-22 Thread Oliver Hartkopp

Hello Manfred,

On 22.06.2015 11:48, Manfred Schlaegl wrote:


Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
your machine?


/proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default)

I tried to dig a little deeper in timestamping:
  1. (net/core/dev.c) I found that static_key_false(netstamp_needed) is always 
0, resulting that the timestamp is never set by net_timestamp_check in 
netif_receive_skb_internal.
  2. (net/core/dev.c) static_key_false(netstamp_needed) is 0 because 
net_enable_timestamp is never called.
  3. (net/core/sock.c) net_enable_timestamp is never called because 
SK_FLAGS_TIMESTAMP is not set
  4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of 
SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set
  5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not 
set because timestamping is an optional feature (according to 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345)
 not enabled in my use case (even if netdev_tstamp_prequeue is set to 1)

So the original assumption for the was correct: The correctness of the skb 
equality check depends on a feature that is not enabled by default 
(respectively user configurable).
Do you agree with this?


Yes.

But the point becomes an issue when there's no userspace application that 
requires timestamps.


I did my testing wile having at least one candump instances running, which 
enables timestamping. So when there's no one requesting timestamps the check 
in can_rcv does not perform properly.


Therefor my patch grabs your idea to set the timestamps for CAN skbs 
unconditionally. But there were some more places in the code where we need to 
take care about that.


Regards,
Oliver


--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv

2015-06-22 Thread Manfred Schlaegl
Hello Oliver,

On 2015-06-21 00:42, Oliver Hartkopp wrote:

 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
 check. Since the sk_buff pointer is not sufficient to do this (buffers
 are reused), the check also compares time stamps.
 In short: pointer+time stamp was assumed as unique key to a specific
 frame.
 The problem with this is, that the time stamp is an optional property
 and not set per default.
 In our case (flexcan) the time stamp is always zero, so the equality
 check is reduced to equality of buffer pointers, resulting in a lot of
 dropped frames.
 
 The question is why your system did not generate a timestamp at the time of
 skb reception.
 
 Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the
 following reception process.
 
 flexcan.c only uses netif_receive_skb() - but all theses functions set the
 timestamp
 
   net_timestamp_check(netdev_tstamp_prequeue, skb);
 
 depending on netdev_tstamp_prequeue which is configured by
 
 /proc/sys/net/core/netdev_tstamp_prequeue
 
 See the idea of netdev_tstamp_prequeue here:
 
 http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771
 
Thank you for the background information!
I've also noticed your patch [PATCH - regression 4.1-rc8] can: fix loss of CAN 
frames in raw_rcv

 Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
 your machine?

/proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default)

I tried to dig a little deeper in timestamping:
 1. (net/core/dev.c) I found that static_key_false(netstamp_needed) is always 
0, resulting that the timestamp is never set by net_timestamp_check in 
netif_receive_skb_internal.
 2. (net/core/dev.c) static_key_false(netstamp_needed) is 0 because 
net_enable_timestamp is never called.
 3. (net/core/sock.c) net_enable_timestamp is never called because 
SK_FLAGS_TIMESTAMP is not set
 4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of 
SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set 
 5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not 
set because timestamping is an optional feature (according to 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345)
 not enabled in my use case (even if netdev_tstamp_prequeue is set to 1)

So the original assumption for the was correct: The correctness of the skb 
equality check depends on a feature that is not enabled by default 
(respectively user configurable).
Do you agree with this?

 
 Thanks again for your investigation!
Sure!

Best regards,
Manfred

--
To unsubscribe from this list: send the line unsubscribe netdev in


[PATCH] can: fix loss of frames due to wrong assumption in raw_rcv

2015-06-20 Thread Manfred Schlaegl
I've detected a massive loss of can frames on i.MX6 using flexcan
driver with 4.1-rc8 and tracked this down to following commit:
514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery
of a single CAN frame for overlapping CAN filters

514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
check. Since the sk_buff pointer is not sufficient to do this (buffers
are reused), the check also compares time stamps.
In short: pointer+time stamp was assumed as unique key to a specific
frame.
The problem with this is, that the time stamp is an optional property
and not set per default.
In our case (flexcan) the time stamp is always zero, so the equality
check is reduced to equality of buffer pointers, resulting in a lot of
dropped frames.

Possible solutions I thought of:
 1. Every driver has to set a time stamp
(possibly error prone and hard to enforce?)
 2. Change the equality check
 3. Fulfil the requirements of the equality check by setting a
time stamp per default.

This patch fixes the problem with solution 3. A time stamp is set at
time of allocation in alloc_can_skb.
The time stamp may be overridden later, but the function of the equality
check is ensured.

I'm not really deep in linux network subsystem, so there may exists
more elegant solutions for the problem.

Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at
---
 drivers/net/can/dev.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index b0f6924..282e2e7 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, 
struct can_frame **cf)
if (unlikely(!skb))
return NULL;
 
+   __net_timestamp(skb);
skb-protocol = htons(ETH_P_CAN);
skb-pkt_type = PACKET_BROADCAST;
skb-ip_summed = CHECKSUM_UNNECESSARY;
-- 
1.7.10.4




signature.asc
Description: OpenPGP digital signature


[PATCH] can: fix loss of frames due to wrong assumption in raw_rcv

2015-06-20 Thread manfred.schla...@gmx.at
I've detected a massive loss of can frames on i.MX6 using flexcan
driver with 4.1-rc8 and tracked this down to following commit:
514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery
of a single CAN frame for overlapping CAN filters

514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
check. Since the sk_buff pointer is not sufficient to do this (buffers
are reused), the check also compares time stamps.
In short: pointer+time stamp was assumed as unique key to a specific
frame.
The problem with this is, that the time stamp is an optional property
and not set per default.
In our case (flexcan) the time stamp is always zero, so the equality
check is reduced to equality of buffer pointers, resulting in a lot of
dropped frames.

Possible solutions I thought of:
 1. Every driver has to set a time stamp
(possibly error prone and hard to enforce?)
 2. Change the equality check
 3. Fulfil the requirements of the equality check by setting a
time stamp per default.

This patch fixes the problem with solution 3. A time stamp is set at
time of allocation in alloc_can_skb.
The time stamp may be overridden later, but the function of the equality
check is ensured.

I'm not really deep in linux network subsystem, so there may exists
more elegant solutions for the problem.

Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at
---
 drivers/net/can/dev.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index b0f6924..282e2e7 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device
*dev, struct can_frame **cf)
if (unlikely(!skb))
return NULL;
 +  __net_timestamp(skb);
skb-protocol = htons(ETH_P_CAN);
skb-pkt_type = PACKET_BROADCAST;
skb-ip_summed = CHECKSUM_UNNECESSARY;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in


Re: [PATCH] can: fix loss of frames due to wrong assumption in raw_rcv

2015-06-20 Thread Oliver Hartkopp
Hello Manfred,

On 06/20/2015 07:21 PM, Manfred Schlaegl wrote:
 I've detected a massive loss of can frames on i.MX6 using flexcan
 driver with 4.1-rc8 and tracked this down to following commit:
 514ac99c64b22d83b52dfee3b8becaa69a92bc4a - can: fix multiple delivery
 of a single CAN frame for overlapping CAN filters

thanks for detecting this issue!

 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
 check. Since the sk_buff pointer is not sufficient to do this (buffers
 are reused), the check also compares time stamps.
 In short: pointer+time stamp was assumed as unique key to a specific
 frame.
 The problem with this is, that the time stamp is an optional property
 and not set per default.
 In our case (flexcan) the time stamp is always zero, so the equality
 check is reduced to equality of buffer pointers, resulting in a lot of
 dropped frames.

The question is why your system did not generate a timestamp at the time of
skb reception.

Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the
following reception process.

flexcan.c only uses netif_receive_skb() - but all theses functions set the
timestamp

net_timestamp_check(netdev_tstamp_prequeue, skb);

depending on netdev_tstamp_prequeue which is configured by

/proc/sys/net/core/netdev_tstamp_prequeue

See the idea of netdev_tstamp_prequeue here:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771

Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
your machine?

If it's not '1' can you set it to '1' for a test?

 
 Possible solutions I thought of:
  1. Every driver has to set a time stamp
 (possibly error prone and hard to enforce?)
  2. Change the equality check
  3. Fulfil the requirements of the equality check by setting a
 time stamp per default.
 
 This patch fixes the problem with solution 3. A time stamp is set at
 time of allocation in alloc_can_skb.

That's a feasible way if won't find a better way to make sure the timestamps
are generally set before the skb is processed in the NET_RX softirq.

 The time stamp may be overridden later, but the function of the equality
 check is ensured.
 
 I'm not really deep in linux network subsystem, so there may exists
 more elegant solutions for the problem.
 
 Signed-off-by: Manfred Schlaegl manfred.schla...@gmx.at
 ---
  drivers/net/can/dev.c |1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
 index b0f6924..282e2e7 100644
 --- a/drivers/net/can/dev.c
 +++ b/drivers/net/can/dev.c
 @@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, 
 struct can_frame **cf)
   if (unlikely(!skb))
   return NULL;
  
 + __net_timestamp(skb);
   skb-protocol = htons(ETH_P_CAN);
   skb-pkt_type = PACKET_BROADCAST;
   skb-ip_summed = CHECKSUM_UNNECESSARY;
 

Please check the netdev_tstamp_prequeue value first.

If we would need solution 3 the __net_timestamp(skb) should be placed in
alloc_canfd_skb() too.

Thanks again for your investigation!

Best regards,
Oliver

--
To unsubscribe from this list: send the line unsubscribe netdev in