Hi Sylvain,
Thanks for your reply. I've been working hard on this issue lately, and
I found something interesting. Specifically I am using FTP for
upper-level application protocol, based on TCP connection in LWIP.
Because of convenience of test, I use PPP to connect the FTP server on a
host PC. So basically it is like,
FTP client <---> TCP/IP (LWIP) <---> PPP <----------------------->
TCP/IP (Linux) <---> FTP server.
After stress test and debugging, more than 10 hours uploading data, I
found the PCB got corrupt in tcp_output(). The case is that tcp_output()
can be blocked by the lower-level function call in tcp_output_segment(),
in which somehow the buffer of lower-layer protocol is full, so the
upper-layer is pending, and at the same time, tcp timer is running,
tcp_slowtmr() is also calling tcp_output(), so this tcp_output() is
called before the previous call is finished, like,
tcp_output()
{
......
tcp_output_segment(); // may be pending here ---> tcp_output() is
called by tcp_slowtmr(), and returned;
......
do something about pcb->unacked and pcb->unsent;
......
}
Obviously pcb->unacked and pcb->unsent can be corrupt, but
pcb->snd_queuelen is unchanged, thus resulting a mismatch between the
queue length and the data in the queue of unacked and unsent. Eventually
the program will go into an assertion.
Since I am using a very old version of LWIP, I am not sure if there is a
problem in the new one. In my opinion, tcp_output() is better to be
designed as reentrant function, it can be blocked, in case the buffer
form lower layer is full, it will be waiting a "write signal" to
continue sending data.
What I changed as a workaround is try to re-check the pcb after
tcp_output_segment(), when the local pointer useg should be pointing to
the tail of unacked queue, otherwise, the unacked queue's content can be
re-written.
Do you have any concern about it? Any suggestion and discussion is welcome.
Best,
Jackie
On 01/11/15 01:17, Sylvain Rochet wrote:
> Hi Jackie,
>
> On Mon, Jan 05, 2015 at 11:59:00PM +0800, Jackie wrote:
>> Hi all,
>>
>> Recently when I am working on LWIP to do some stress test, e.g.
>> continuously uploading data to a server via TCP connection, the device
>> often crashed on an assert statement in tcp_receive(),
>>
>> if (pcb->snd_queuelen != 0) {
>> LWIP_ASSERT("tcp_receive: valid queue length", pcb->unacked !=
>> NULL ||
>> pcb->unsent != NULL);
>> }
>>
>> After debugging the crash case, I found some possible cause that the pcb
>> structure has been corrupted by another thread during a context switch.
>> I singled out one likely candidate, tcp_slowtmr(). In this timer, it
>> calls another function tcp_pcb_purge(), in which it resets both unacked
>> and unsent queue to NULL but without setting queuelen to 0. In some
>> cases (like tcp state is FIN_WAIT_2), this timer will interrupt the
>> current tcp thread in a preemptive OS environment, modifying the current
>> pcb before hitting the assert statement afterwards.
>>
>> How likely will it be if so? Has anyone encountered a similar issue? Any
>> suggestions?
> You are not specific enough to be able to conclude, but, as usual, it
> looks like a broken port or usage which do not follow lwIP threading
> model.
>
> Summary:
>
> - Do *NOT* call anything in interrupt context, nothing, never, never,
> use your OS semaphore signaling to an Ethernet/serial/… RX thread
> - memp_* functions are thread-safe if SYS_LIGHTWEIGHT_PROT is
> set, and again, thread safe does not mean it is interrupt safe, especially
> if your hardware does nested interrupts
> - Do *NOT* call any function from the RAW API outside lwIP thread
> - Use Netconn or Socket API in others threads, but keep in mind you
> should not share a Netconn/Socket control block between threads, (or use
> proper locking if you really have to, of course).
>
> Sylvain
>
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users