Found it.

Two bugs canceling each other.
The bind sequence in:  psock_txring_vnet.c is wrong.

It does the following addr.sll_protocol =    htons(ETH_P_IP);
before calling bind.

If you set addr.sll_protocol to ETH_P_ALL where it should have been in the first place the test program blows up with -ENOBUFS

I think what is happening is that this value is taken into account when looking at "what should I use to segment it with" in skb_mac_gso_segment which is invoked at the end of the verification chain which starts in packet_direct_xmit in af_packet.c

I have not tried the other test cases like setting it to ETH_P_IP and giving it IPv6 traffic or the opposite, but my guess is that these will fail too if they need GSO to be applied.

A.

On 10/12/17 15:12, Anton Ivanov wrote:


On 10/12/17 14:39, Willem de Bruijn wrote:
If I produce a real vnet frame out of a live kernel frame using
virtio_net_hdr_from_skb() and try to send it it fails on the check in
af_packet, while succeeding for tap. If I remove the af_packet check the
frame is accepted by the hardware too.

If I produce it a synthetic frame + vnet header using the test program - it
works. Go figure.
Besides looking at the raw frame bytes, also compare the setup
of virtio_net_header, as well as the tcp checksum field. The stack
expects the pseudo header to have already been calculated.

I am feeding it a skb which is coming up in the tx routine of a User Mode Linux device which is marked as NETIF_F_HW_CSUM and SG - that results in a skb with csum-ed headers, body set up for CSUM_PARTIAL and multiple fragments (always at least 1 more frag besides the TCP head).

That has everything in order as expected by virtio_net_hdr_from_skb and this is what I use to generate the vnet header. It works correctly for csum and GRO with af_packet and it works correctly for everything using a tap device. It fails only on GSO + af_packet TX.

What I am doing is the same thing virtio_net does - it just takes the output of virtio_net_hdr_from_skb and does nothing more. There should be no need to do anything more :(

It should just work.

Unless there is a gremlin somewhere in the machinery and that gremlin needs some light to be flushed out.

I am going to continue digging into it.

At the very least I now have a positive test case which uses the same
semantics as my code so I have something to compare to.
Glad to hear that the test is helpful. I wrote it because I
have run into these exact same issues in the past.

It is. I have changes ready for it so it also supports vector IO, need to finish fighting with it.

A.




--
Anton R. Ivanov

Cambridge Greys Limited, England and Wales company No 10273661
http://www.cambridgegreys.com/

Reply via email to