Hi Simon,

I return to this old post. As you requested, I migrated our system to 2.0.2
using Xilinx port of LWIP.
https://github.com/Xilinx/embeddedsw/tree/master/ThirdParty/sw_services/lwip202

and I observe the same behaviour that I reported above.

I am able to reproduce the break with SOCKETS_DEBUG=1. This shows:
17:24:27.846:
lwip_accept(2)...

17:24:27.848: lwip_accept(2): netconn_acept failed, err=-13

I have not yet been able to break it with also TCPIP_DEBUG=1. There is a
sort of race condition which is not met when printing out more info.

Attached find lwipopts.h and sys_arch.c which might be useful.

Thanks for any hint where to look or how to get more debug information.

Best,
Oldrich








On Sat, 31 Mar 2018 at 23:26, Oldrich Kepka <oldrich.ke...@cern.ch> wrote:

> Hi,
>
> we run lwip-1.4.0 on PPC440 and experience rare random hanging of TCP. I
> was able to create a minimal working example to reproduce the hang: Setup a
> tcp server on the PPC:
>
>     int socketId =  socket(AF_INET, SOCK_STREAM, 0);
>     if(socketId == -1){...return;}
>
>
>     struct sockaddr_in server;
>     server.sin_family = AF_INET;
>     server.sin_port = htons(12121);
>     server.sin_addr.s_addr = INADDR_ANY;
>
>     int err = bind(socketId, (struct sockaddr *) &server, sizeof (server));
>     if (err < 0) {... return;}
>
>     err = listen(socketId, 1);
>     if (err < 0) { .... return;}
>
>    while(1) {
>
>         int socketConn = accept(socketId, NULL, NULL);
>         sys_thread_t thread = sys_thread_new("tcip_server",
> processConnection,
>                                              (void*)socketConn,
>                                              2*THREAD_STACKSIZE,
>                                              DEFAULT_THREAD_PRIO);
>     }
>
> sponing a thread on accepted connection:
>
> void processConnection(void *p) {
>
>     int sd = (int)p;
>     uint8_t *buffer = new uint8_t[CMD_MAX_SIZE];
>     uint32_t n = 0, bufOffset = 0;
>
>     while((n  = read(sd, buffer+bufOffset, CMD_MAX_SIZE-bufOffset)) > 0 ) {
>         bufOffset += n;
>     }
>     ......
>     if(buffer) delete buffer;
>
>     close(sd);
> }
>
> Then keep dumping the content of a file (~30 characters)
> for i in {1..n}; do cat some_file > /dev/tcp/DEVICE_IP/12121; done
> from 2 shells at the same time. For some time I see random
>
> cat: write error: Connection reset by peer
>
> However after some time, this message is printed after every command of
> any of the two threads. At this point the tcp breaks. I admit that this
> example is rather agressive, but allows me to get the system to a similar
> problematic state that we experience in production.
>
>
> I found that after the TCP breaks down, UDP communication still works and
> I am able to check the state of the system. For example, lwip_stats.tcp
> counts properly incoming TCP packets. One cannot however create new tcp
> socket anymore. I don't see TCPIP_MSG_API messages in tcpip_thread anymore,
> etc. Placing printouts/usleep(1000) inside in some places removes the (race
> condition) problem, but also slows the system.
>
> Any advice on how to move forward in debugging this would be very much
> appreciated. opt.h and tcp_impl.h attached. I tried to play blindly with a
> few paramters (TCP_TMR_INTERVAL, TCP_SLOW_INTERVAL, MEM_ALIGNMENT,
> MEMP_OVERFLOW_CHECK, MEMP_NUM_TCP_PCB, MEMP_NUM_TCP_PCB_LISTEN) with no
> success.
>
> Best,
> Oldrich
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Attachment: lwipopts.h
Description: Binary data

Attachment: sys_arch.c
Description: Binary data

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to