Hi Simon & Sylvain,

After re-examining the code, finally I got more understanding of what
can cause this problem. Instead of more than one threads are calling
tcp_output(), there is only one thread tcpip_thread(), but in which
tcp_output() is called recursively.

The case happens when the lower-layer protocol (PPP) uses
sys_sem_wait(). In this function, it is not only waiting a semaphore,
but it also gives the timer chances to run. As long as it times out, the
tcp timer will be called, and in the timer, tcp_output() will be called
again, like,

tcpip_thread()
{
    ...
    tcp_input()
    {
        ...
        tcp_output()
        {
            ...
            pppifOutput()
            {
                ...
                sys_sem_wait();  // Here the tcp_slowtmr() has a chance
to run again and tcp_output() may be called again in it.
                some_ppp_write_func();
                ...
            }
            ...
        }
        ...
    }
    ...
}

@Sylvain,
You are right, this PPP protocol implementation is from a third party,
so that I am not allowed to modify it so much. But I think this design
is quite buggy, and the worst case is tcp_output() can be called
recursively several times.

So I think in my case, there is no issue related to original LWIP design.

Best,
Jackie



On 01/17/15 04:47, [email protected] wrote:
> Jackie:
>> After stress test and debugging, more than 10 hours uploading data, I
>> found the PCB got corrupt in tcp_output(). The case is that
>> tcp_output() can be blocked by the lower-level function call in
>> tcp_output_segment(), in which somehow the buffer of lower-layer
>> protocol is full, so the upper-layer is pending, and at the same
>> time, tcp timer is running,  tcp_slowtmr() is also calling
>> tcp_output(), so this tcp_output() is called before the
>
> There you got the bug: when lwIP's threading requirements are
> observed, this can't happen: tcp_output() can never be called twice
> and thus does not have to be designed reentrant.
>
> What you describe tells us that timers are checked from a different
> execution thread (thread or ISR) than output. But for the core lwIP
> code, you have to ensure this doesn't happen. That's all.
>
> Of course this raises the problem of what to do with TX packets when
> e.g. your DMA queue is full. Usually it's best to add a 2nd (larger)
> software-queue that fills the DMA queue and to keep an upper limit on
> it. You'd then return ERR_IF when this limit is reached.
>
> Simon
>
>
> _______________________________________________
> lwip-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to