Simon wrote: > Mason wrote: > >> IMHO, the elephant in the room is task-switching, as correctly >> pointed out by Kieran.
In my previous tests, I ran all three network-related threads (RxTask, tcpip_thread, rxapp) at the same priority (MIN+4) while other threads ran at the minimum priority. Lowering the priority of rxapp to MIN+2 improved the throughput to 65.7 Mbit/s (I think there are less context switches when it's done like this, but I'm not sure how to measure this.) > Well, given a correctly DMA-enabled driver, you could avoid one task > switch by checking RX packets from tcpip_thread instead of using another > thread for RX (as suggest your "Task breakdown" by the name "RxTask"). Correct; the OS panics when I call tcpip_input from the ISR, so I set up RxTask which runs : static void rx_task(void *arg) { while ( 1 ) { /*** WAIT FOR THE NEXT PACKET ***/ ethernet_async_t *desc = message_receive(rx_queue); /*** PROCESS THE PACKET ***/ struct pbuf *pbuf = pbuf_alloc(PBUF_RAW, desc->length, PBUF_RAM); memcpy(pbuf->payload, desc->buffer, desc->length); mynetif->input(pbuf, mynetif); /*** RETURN DESC TO THE LIST OF AVAILABLE READ DESCRIPTORS ***/ int err = device_ioctl(dev, OSPLUS_IOCTL_ETH_ASYNC_READ, desc); if (err) printf("ASYNC_READ IOCTL FAIL\n"); } } > You would then set a flag / post a static message from your ISR, process > the packet in tcpip_thread (without having to copy it) and post the data > to your application thread. > > Also, by using the (still somewhat experimental) LWIP_TCPIP_CORE_LOCKING > feature, you can also avoid the task switch from application task to > tcpip_thread (by using a mutex to lock the core instead of passing a > message). I wanted to enable LWIP_TCPIP_CORE_LOCKING (and LWIP_TCPIP_CORE_LOCKING_INPUT; what is the difference? they're not documented AFAICS) but I was scared off by the "Don't use it if you're not an active lwIP project member" comment ;-) Also, my OS forbids using mutexes from an ISR. Perhaps I could keep the RxTask, and enable LWIP_TCPIP_CORE_LOCKING_INPUT which would take tcpip_thread out of the equation? Thus, LOCK_TCPIP_CORE would be called from task context, which is fine. >> Assuming that every memcpy were lwip-related, and that I could >> get rid of them (which I don't see how, given Simon's comments) >> the transfer would take 478 instead of 516 seconds. > > I didn't mean to discourage you with my comments, I only meant it > doesn't work out-of-the box with a current lwIP. However, I know it's > not as easy for an lwip beginner to make the changes required for the RX > side (the TX side should not be a problem via adapting the mem_malloc() > functions). > > If I made the changes to support PBUF_REF for RX in git, would you be > able to switch to that for testing? Yes, I've tried to do a clean port, so I should be able to upgrade to a newer version quite easily. > I plan to implement zero-copy on an ARM-based board I have here, but I > haven't found the time for that, lately :-( I will follow that development closely. -- Regards. _______________________________________________ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users