On Wed, Oct 14, 2020 at 8:12 AM Willem de Bruijn <willemdebruijn.ker...@gmail.com> wrote: > > On Wed, Oct 14, 2020 at 4:52 AM Xie He <xie.he.0...@gmail.com> wrote: > > > > On Sun, Oct 11, 2020 at 2:01 PM Willem de Bruijn > > <willemdebruijn.ker...@gmail.com> wrote: > > > > > > There is agreement that hard_header_len should be the length of link > > > layer headers visible to the upper layers, needed_headroom the > > > additional room required for headers that are not exposed, i.e., those > > > pushed inside ndo_start_xmit. > > > > > > The link layer header length also has to agree with the interface > > > hardware type (ARPHRD_..). > > > > > > Tunnel devices have not always been consistent in this, but today > > > "bare" ip tunnel devices without additional headers (ipip, sit, ..) do > > > match this and advertise 0 byte hard_header_len. Bareudp, vxlan and > > > geneve also conform to this. Known exception that probably needs to be > > > addressed is sit, which still advertises LL_MAX_HEADER and so has > > > exposed quite a few syzkaller issues. Side note, it is not entirely > > > clear to me what sets ARPHRD_TUNNEL et al apart from ARPHRD_NONE and > > > why they are needed. > > > > > > GRE devices advertise ARPHRD_IPGRE and GRETAP advertise ARPHRD_ETHER. > > > The second makes sense, as it appears as an Ethernet device. The first > > > should match "bare" ip tunnel devices, if following the above logic. > > > Indeed, this is what commit e271c7b4420d ("gre: do not keep the GRE > > > header around in collect medata mode") implements. It changes > > > dev->type to ARPHRD_NONE in collect_md mode. > > > > > > Some of the inconsistency comes from the various modes of the GRE > > > driver. Which brings us to ipgre_header_ops. It is set only in two > > > special cases. > > > > > > Commit 6a5f44d7a048 ("[IPV4] ip_gre: sendto/recvfrom NBMA address") > > > added ipgre_header_ops.parse to be able to receive the inner ip source > > > address with PF_PACKET recvfrom. And apparently relies on > > > ipgre_header_ops.create to be able to set an address, which implies > > > SOCK_DGRAM. > > > > > > The other special case, CONFIG_NET_IPGRE_BROADCAST, predates git. Its > > > implementation starts with the beautiful comment "/* Nice toy. > > > Unfortunately, useless in real life :-)". From the rest of that > > > detailed comment, it is not clear to me why it would need to expose > > > the headers. The example does not use packet sockets. > > > > > > A packet socket cannot know devices details such as which configurable > > > mode a device may be in. And different modes conflict with the basic > > > rule that for a given well defined link layer type, i.e., dev->type, > > > header length can be expected to be consistent. In an ideal world > > > these exceptions would not exist, therefore. > > > > > > Unfortunately, this is legacy behavior that will have to continue to > > > be supported. > > > > Thanks for your explanation. So header_ops for GRE devices is only > > used in 2 special situations. In normal situations, header_ops is not > > used for GRE devices. And we consider not using header_ops should be > > the ideal arrangement for GRE devices. > > > > Can we create a new dev->type (like ARPHRD_IPGRE_SPECIAL) for GRE > > devices that use header_ops? I guess changing dev->type will not > > affect the interface to the user space? This way we can solve the > > problem of the same dev->type having different hard_header_len values. > > But does that address any real issue?
It doesn't address any issue visible when using. Just to solve the problem of the same dev->type having different hard_header_len values which you mentioned. Making this change will not affect the user in any way. So I think it is valuable to make this change. > If anything, it would make sense to keep ARHPHRD_IPGRE for tunnels > that expect headers and switch to ARPHRD_NONE for those that do not. > As the collect_md commit I mentioned above does. I thought we agreed that ideally GRE devices would not have header_ops. Currently GRE devices (in normal situations) indeed do not use header_ops (and use ARHPHRD_IPGRE as dev->type). I think we should keep this behavior. To solve the problem of the same dev->type having different hard_header_len values which you mentioned. I think we should create a new dev->type (ARPHRD_IPGRE_SPECIAL) for GRE devices that use header_ops. Also, for collect_md, I think we should use ARHPHRD_IPGRE. I see no reason to use ARPHRD_NONE. > > Also, for the second special situation, if there's no obvious reason > > to use header_ops, maybe we can consider removing header_ops for this > > situation. > > Unfortunately, there's no knowing if some application is using this > broadcast mode *with* a process using packet sockets. We can't always keep the interface to the user space unchanged when fixing problems. When we fix drivers by adding hard_header_len or removing hard_header_len, we ARE changing the interface. I did these fixes a lot. I also changed skb->protocol when sending skbs for some drivers, which in fact was also changing the interface. It is not possible to keep the interface strictly unchanged, otherwise a lot of problems will be impossible to fix.