Comparing DMA direct to skbufs with packet copy in the FEC driver, I said: > But we get burst transfers into cache in either case,
Dan writes: > No, because I originally allocated the receive buffers uncached. That's comparing apples with oranges though. I wasn't talking about what the current driver does, I was comparing your suggestion of cached receive buffers (which I also implemented in the FEC and benchmarked) with DMA direct to the cached skbufs. In both cases the CPU bursts the data into the cache when it first goes to access it, so that doesn't explain why I found that DMA direct to the skbuf was faster overall than just making the Rx buffer cached and retaining the copy. Both gave a measurable speed improvement over the original driver. Note that even when doing DMA direct to the skbuf, it's normal to have a size threshold below which packets are copied into a newly allocated skbuf of the exact size. This avoids wasting skbuf space on tiny packets, and gives the opportunity to nicely align the IP header. As a result, small packets (where IP stack processing dominates and header alignment is most important) are processed exactly the way you describe, while large ones (where avoiding copying the payload is most important) avoid being copied. Hence you end up with the best of both worlds. Regards, Graham ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
