> Allocate regular pages to use as backing for the RX ring and use the > DMA API to sync the caches. This should give a bit better performance > since it allows the CPU to do burst transfers from memory. It is also > a necessary step on the way to reduce the amount of copying done by > the driver.
I've not tried to understand the patches, but you have to be very careful using non-snooped memory for descriptor rings. No amount of DMA API calls can sort out some of the issues. Basically you must not dirty a cache line that contains data that the MAC unit might still write to. For the receive ring this means that you must not setup new rx buffers for ring entries until the MAC unit has filled all the ring entries in the same cache line. This probably means only adding rx buffers in blocks of 8 or 16 (or even more if there are large cache lines). I can't see any code in the patch that does this. Doing the same for the tx ring is more difficult, especially if you can't stop the MAC unit polling the TX ring on a timer basis. Basically you can only give the MAX tx packets if either it is idle, or if the tx ring containing the new entries starts on a cache line. If the MAC unit is polling the ring, then to give it multiple items you may need to update the 'owner' bit in the first ring entry last - just in case the cache line gets written out before you've finished. David -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/