On 05/01/2014 06:09 PM, Zoltan Kiss wrote:
On 29/04/14 17:36, Thomas Graf wrote:
On Tue, Apr 29, 2014 at 05:17:07PM +0100, Zoltan Kiss wrote:
On 23/04/14 22:56, Thomas Graf wrote:
On 04/23/2014 10:12 PM, Ethan Jackson wrote:
The problem has actually gotten worse since we've gotten rid of the
dispatcher thread.  Now each thread has it's own channel per port.

I wonder if the right approach is to simply ditch the per-port
fairness in the case where mmap netlink is enabled.  I.E. we simply
have one channel per thread and call it a day.

Anyways, I don't have a lot of context on this thread, so take
everything above with a grain of salt.

I agree with Ethan's statement. Even with a reduced frame size the cost
of an individual ring buffer per port is likely still too large and
we lose the benefit of zerocopy for large packets which are typically
the expensive packets.
My expectation is that such large packets shouldn't go to the
userspace very often, as ideally the TCP handshake packets already
established the flow. Do you have a use case where this is not true?

The common use case is a flow expiring during the lifetime of a TCP
connection. It will result in multiple data packets being sent upwards.
It's much less likely in the megaflows era though.

As we extend the GSO path into the upcall and make use of the new DPDK
style ofpbuf to avoid the memcpy() for the mmap case
Can you elaborate a bit more on this?

The current upcall code does segmentation which is not required and is
expensive for the above mentioned case. A single 64K GSO packet will
automatically result in up to 50 upcalls.

Also, right now, the first thing we do in the mmap case is copy the
buffer into an ofpbuf. This is not required at all and the copy is
expensive,  instead, we should make use of the shared memory just like
in the DPDK case and only release the buffer after the packet has been
fully processed.

So you suggest userspace should directly access the linear buffer and
the frags, instead of copying them into the shared buffer?

That would be step three. The first intermediate step I suggest is to
have ofpbuf point to the shared buffer instead of allocating new space
for the ofpbuf data just like DPDK does.

An API that would allow nlmmap to refer to the DMA buffer directly does
not exist yet but is definitely desirable. Given that, the cost of an
upcall would be reduced to the cost of a context switch which can be
further reduced by pushing batches or 64K GSO frames.

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to