Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close

goldsi...@gmx.de Wed, 12 Oct 2022 12:14:39 -0700

Sorry, nothing comes to mind except for maybe you don't close your TCP
pcbs correctly. Normally, pcbs in time-wait should just be reused. If
you experience you need some kind of delay, maybe your pcbs are stuck in
a state != time-wait?


Regards,
Simon

Am 04.10.2022 um 19:16 schrieb Geoff Simmons:

Hello,

I'm a new subscriber, and am working on my first "serious" LWIP project
(meaning more than just a sample). This is an HTTP server for the
Raspberry Pi PicoW, using the raw TCP API, accessed via the Pico C SDK
(in which LWIP is a git submodule):

https://gitlab.com/slimhazard/picow_http

It's going well, except for one problem that has me stumped after trying
to fix it for days. If a client attempts to connect shortly after a
number of connections were closed, then intermittently (not always, but
fairly often), the accept process stalls for a long time -- seems to be
as long as 10 seconds, maybe more.

So I'm hoping that someone on the list can help spot the error.

When this happens, I see sequences like this in debug output:

TCP connection request 40270 -> 80.
tcp_enqueue_flags: queueing 27040:27041 (0x12)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_slowtmr: processing active pcb
tcp_output_segment: 27040:27040
tcp_slowtmr: polling application
tcp_output: nothing to send (00000000)
tcp_output_segment: 27040:27040
TCP connection established 40270 -> 80.

The pattern is always:

- "TCP connection request"
- tcp_enqueue_flags with a range of 1 ("queueing n:(n+1)"), always with
the hex value 0x12
- this sequence, repeated many times:
    - "tcp_slowtmr: processing active pcb"
    - "tcp_slowtmr: polling application"
    - "tcp_output: nothing to send (00000000)",

tcp_output_segment with range 0 ("n:n") is interspersed in the repeating
sequence. After the stall comes "TCP connection established", and then
everything proceeds normally. With long timeouts on the client side, all
of the requests succeed, despite the long stall.

All of this happens before the tcp_accept callback is invoked. When the
stalls happen, I see the client side sending SYN retransmissions in
wireshark. I haven't noticed anything else unusual in wireshark (of
course it's easy to overlook something).

I usually see this when repeating a test script that sends a few dozen
requests. There's no stall on the first connection after server startup.
There's also no stall if I wait long enough between sending batches of
requests. But if I run the test script and then start it again shortly
afterward, it can stall for quite a while on the second run.

During a stall, I see MEM TCP_PCB stats showing "used" == "max", i.e.
all tcp_pcbs in the pool are used. I assume that after all connections
are closed following a series of requests, they *should* be in
TIME_WAIT, and then for the next connection, the oldest PCB in TIME_WAIT
gets re-used. I have seen debug tcp debug output saying exactly that.

But I suspect that my application code is not doing everything right
about closing connections. Bearing in mind that there's a lot I don't
know about LWIP, so this hypothesis may be nonsense -- the feeling is
that I have discarded a connection, thinking that is fully closed and
should be in TIME_WAIT; but it isn't. Then on the next client
connection, the PCB thinks it still needs to send something like an ACK
or FIN, and stalls while doing so, accounting for the long sequence
"processing active pcb" and "nothing to send". Eventually (because a
timeout elapses?) the PCB gives up and accept can proceed.

Does (0x12) in the tcp_enqueue_flags debug output refer to a PCB's tcp
flags? If so, then the value is TF_RXCLOSED | TF_ACK_DELAY. Is that
significant? It doesn't "sound right" for a PCB to be used for an
incoming connection.

Some things I've tried to fix the problem, none of which have succeeded:

- Wait until all bytes of a sent response have been ACKed (using the
tcp_sent callback). This may mean that HTTP request pipelining is not
possible. And it hasn't helped.

- Increase MEMP_NUM_TCP_PCB. But it doesn't seem to matter, if there
have been enough requests so that all of them are used (and should be in
TIME_WAIT), then the same thing happens. When I "wait long enough"
between batches of requests, 6 PCBs are enough (I've tried up to 24).

- Increase MEM_SIZE. MEM HEAP stats show very consistently that it never
needs more than 4800 bytes.

- Increase MEMP_NUM_TCP_SEG from 32 to 64. A bit of a desperation move,
because I don't understand what it does, at any rate that didn't help.

Sorry for the long introductory post, I'm trying to cover what I think
I've understood about the problem. I assume that I've misunderstood
something about the TCP API, and someone can set me straight.


Thanks,
Geoff

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users



_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] Raw TCP, intermittent long delays in accept after after close

Reply via email to