Hi Simon & Sylvain,
After re-examining the code, finally I got more understanding of what
can cause this problem. Instead of more than one threads are calling
tcp_output(), there is only one thread tcpip_thread(), but in which
tcp_output() is called recursively.
The case happens when the lower-layer protocol (PPP) uses
sys_sem_wait(). In this function, it is not only waiting a semaphore,
but it also gives the timer chances to run. As long as it times out, the
tcp timer will be called, and in the timer, tcp_output() will be called
again, like,
tcpip_thread()
{
...
tcp_input()
{
...
tcp_output()
{
...
pppifOutput()
{
...
sys_sem_wait(); // Here the tcp_slowtmr() has a chance
to run again and tcp_output() may be called again in it.
some_ppp_write_func();
...
}
...
}
...
}
...
}
@Sylvain,
You are right, this PPP protocol implementation is from a third party,
so that I am not allowed to modify it so much. But I think this design
is quite buggy, and the worst case is tcp_output() can be called
recursively several times.
So I think in my case, there is no issue related to original LWIP design.
Best,
Jackie
On 01/17/15 04:47, [email protected] wrote:
> Jackie:
>> After stress test and debugging, more than 10 hours uploading data, I
>> found the PCB got corrupt in tcp_output(). The case is that
>> tcp_output() can be blocked by the lower-level function call in
>> tcp_output_segment(), in which somehow the buffer of lower-layer
>> protocol is full, so the upper-layer is pending, and at the same
>> time, tcp timer is running, tcp_slowtmr() is also calling
>> tcp_output(), so this tcp_output() is called before the
>
> There you got the bug: when lwIP's threading requirements are
> observed, this can't happen: tcp_output() can never be called twice
> and thus does not have to be designed reentrant.
>
> What you describe tells us that timers are checked from a different
> execution thread (thread or ISR) than output. But for the core lwIP
> code, you have to ensure this doesn't happen. That's all.
>
> Of course this raises the problem of what to do with TX packets when
> e.g. your DMA queue is full. Usually it's best to add a 2nd (larger)
> software-queue that fills the DMA queue and to keep an upper limit on
> it. You'd then return ERR_IF when this limit is reached.
>
> Simon
>
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users