On Mon, Sep 15, 2014 at 08:20:25PM +0300, Or Gerlitz wrote: > On Mon, Sep 15, 2014 at 7:58 PM, Jason Gunthorpe > <jguntho...@obsidianresearch.com> wrote: > > To do this, you need to transfer the offload state across the wire, so > > on receive you inject the packet with the proper tag that the csum is > > not computed but ready for offload. A node receiving a packet like > > this would have to compute the csum before sending it onwards, so no, > > if done properly it will not break gateways. > > > > All the core infrastructure is there, all the virtualization drivers > > work like this - the guest side does not compute the csum, and the > > hyperviser side receives the packet with that flag, and the csum > > ultimately is offloaded to the physical NIC. Look at the xen net > > driver for an example. > > But is done on the xmitting hypervisor, isn't it? if this is the case, > I don't see > the similarity to the IPoIB CM case.
I'm not sure what you mean? You raised the concern about gateways, which is identical to the hypervisor case: G-LINUX --(NO CSUM)--> ring buffer --> H-LINUX --(NO CSUM)--> NIC->WIRE A-LINUX --(NO CSUM)--> RC QP --> B-LINUX --(NO CSUM)--> NIC->WIRE The key is that csum state is placed in the ring buffer/RC QP with every packet. Basically, you serialize the entire offload state the IPoIB send receives from the kernel net stack, dump that onto the wire, and restore that exact same semantic state on the receive side. The NIC sees the same packet, with the same offload meta data, as though it were directly connected to the sending Linux kernel. The *typical* IPoIB CM case is similar to a guest talking to another guest: G1 --(NO CSUM)--> ring buffer --> H-LINUX --(NO CSUM)--> ring buffer --(NO CSUM)--> G2 Here the packet is never csum'd - the 2nd guest simply accepts the packet with an uncsum'd tag. If you flatten the above it looks identical to the typical IPoIB case. Hypervisors are now also doing the same trick with GSO, they send large packets without a high MTU, because they can take then GSO master packet state from the sending guest and shuttle the whole thing without segmentation to the receiving guest (or NIC). IPoIB should do the same. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html