Re: How to find a bug with lost network messages

2016-02-02 Thread Arthur Pichlkostner

I just know that netif_rx() should be updated to netif_rx_ni() for newer 
kernels.
Without the change I had NOHZ errors in the log, the same change was done in 
SLCAN.
Maybe this is the origin of your problem.

You can try our fork on https://github.com/tjohann/sllin which includes many 
improvents and fixes compared to the original driver from 2013.

On Tue, Feb 02, 2016 at 10:09:20AM +0100, Sandro Stiller wrote:
> Hello,
> 
> I'm struggeling with a network driver (sllin[1]) which is not in the 
> official kernel.
> It has a lot in common with the slcan driver but is used for LIN networks.
> The problem is, that sometimes messages sent to the network layer via 
> netif_rx() don't arrive in all listening programs.
> 
> This is how the driver works:
> 1. The application sends CAN messages to the network interface
> 2. The driver forwards it to the UART (tty)
> 3. The UART receives the same message (single-wire connection, RX and TX 
> connected) and sends it back to the network layer
> 4. The sending application receives the previously sent message and can 
> check for transmission errors and appended LIN slave replies.
> 
> Sometimes the last point (4.) does not work after 10 - 40 seconds of 
> transmission.
> The application does not receive the message using a blocking read() on 
> the socket, but other processes receive it (running candump on the 
> interface). netif_rx() always returns 0.
> 
> If more programs are listening (running multiple instances of candump), 
> the problem appears less often or never.
> On my PC there is no problem, it occures on ARM only.
> I'm using kernel 4.1.
> 
> Can you give me a hint where to search for the cause of this behaviour?
> 
> Thank you very much.
> 
> Sandro
> 
> 
> [1]: https://github.com/sstiller/sllin/tree/master/sllin
> 
> ___
> Kernelnewbies mailing list
> Kernelnewbies@kernelnewbies.org
> http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies


How to find a bug with lost network messages

2016-02-02 Thread Sandro Stiller
Hello,

I'm struggeling with a network driver (sllin[1]) which is not in the 
official kernel.
It has a lot in common with the slcan driver but is used for LIN networks.
The problem is, that sometimes messages sent to the network layer via 
netif_rx() don't arrive in all listening programs.

This is how the driver works:
1. The application sends CAN messages to the network interface
2. The driver forwards it to the UART (tty)
3. The UART receives the same message (single-wire connection, RX and TX 
connected) and sends it back to the network layer
4. The sending application receives the previously sent message and can 
check for transmission errors and appended LIN slave replies.

Sometimes the last point (4.) does not work after 10 - 40 seconds of 
transmission.
The application does not receive the message using a blocking read() on 
the socket, but other processes receive it (running candump on the 
interface). netif_rx() always returns 0.

If more programs are listening (running multiple instances of candump), 
the problem appears less often or never.
On my PC there is no problem, it occures on ARM only.
I'm using kernel 4.1.

Can you give me a hint where to search for the cause of this behaviour?

Thank you very much.

Sandro


[1]: https://github.com/sstiller/sllin/tree/master/sllin

___
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies