Hi Simon, I return to this old post. As you requested, I migrated our system to 2.0.2 using Xilinx port of LWIP. https://github.com/Xilinx/embeddedsw/tree/master/ThirdParty/sw_services/lwip202
and I observe the same behaviour that I reported above. I am able to reproduce the break with SOCKETS_DEBUG=1. This shows: 17:24:27.846: lwip_accept(2)... 17:24:27.848: lwip_accept(2): netconn_acept failed, err=-13 I have not yet been able to break it with also TCPIP_DEBUG=1. There is a sort of race condition which is not met when printing out more info. Attached find lwipopts.h and sys_arch.c which might be useful. Thanks for any hint where to look or how to get more debug information. Best, Oldrich On Sat, 31 Mar 2018 at 23:26, Oldrich Kepka <oldrich.ke...@cern.ch> wrote: > Hi, > > we run lwip-1.4.0 on PPC440 and experience rare random hanging of TCP. I > was able to create a minimal working example to reproduce the hang: Setup a > tcp server on the PPC: > > int socketId = socket(AF_INET, SOCK_STREAM, 0); > if(socketId == -1){...return;} > > > struct sockaddr_in server; > server.sin_family = AF_INET; > server.sin_port = htons(12121); > server.sin_addr.s_addr = INADDR_ANY; > > int err = bind(socketId, (struct sockaddr *) &server, sizeof (server)); > if (err < 0) {... return;} > > err = listen(socketId, 1); > if (err < 0) { .... return;} > > while(1) { > > int socketConn = accept(socketId, NULL, NULL); > sys_thread_t thread = sys_thread_new("tcip_server", > processConnection, > (void*)socketConn, > 2*THREAD_STACKSIZE, > DEFAULT_THREAD_PRIO); > } > > sponing a thread on accepted connection: > > void processConnection(void *p) { > > int sd = (int)p; > uint8_t *buffer = new uint8_t[CMD_MAX_SIZE]; > uint32_t n = 0, bufOffset = 0; > > while((n = read(sd, buffer+bufOffset, CMD_MAX_SIZE-bufOffset)) > 0 ) { > bufOffset += n; > } > ...... > if(buffer) delete buffer; > > close(sd); > } > > Then keep dumping the content of a file (~30 characters) > for i in {1..n}; do cat some_file > /dev/tcp/DEVICE_IP/12121; done > from 2 shells at the same time. For some time I see random > > cat: write error: Connection reset by peer > > However after some time, this message is printed after every command of > any of the two threads. At this point the tcp breaks. I admit that this > example is rather agressive, but allows me to get the system to a similar > problematic state that we experience in production. > > > I found that after the TCP breaks down, UDP communication still works and > I am able to check the state of the system. For example, lwip_stats.tcp > counts properly incoming TCP packets. One cannot however create new tcp > socket anymore. I don't see TCPIP_MSG_API messages in tcpip_thread anymore, > etc. Placing printouts/usleep(1000) inside in some places removes the (race > condition) problem, but also slows the system. > > Any advice on how to move forward in debugging this would be very much > appreciated. opt.h and tcp_impl.h attached. I tried to play blindly with a > few paramters (TCP_TMR_INTERVAL, TCP_SLOW_INTERVAL, MEM_ALIGNMENT, > MEMP_OVERFLOW_CHECK, MEMP_NUM_TCP_PCB, MEMP_NUM_TCP_PCB_LISTEN) with no > success. > > Best, > Oldrich > > > > > > > > > > > > > > > > > >
lwipopts.h
Description: Binary data
sys_arch.c
Description: Binary data
_______________________________________________ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users