On Sun, Dec 10, 2017 at 9:02 AM, Alexander Duyck <alexander.du...@gmail.com> wrote: > On Sat, Dec 9, 2017 at 10:40 PM, Michael Chan <michael.c...@broadcom.com> > wrote: >> It is possible that if you have incoming packets 1, 2, 3, 4, 5 for a >> TCP connection, HW_GRO can aggregate packets 1, 2, 3, but cannot >> aggregate packets 4 and 5 due to hardware resource limitation. >> Software GRO aggregates 4 and 5. So it works well together. > > Right. But in the case where 1, 2, 3, 4, and 5 were not aggregated by > hardware GRO because the frames could not be aggregated and then GRO > burns cycles coming to the same conclusion you have waste. Same thing > goes for if hardware GRO aggregates 1 through 5 and then SW GRO tries > to see if it can do more. > > They are both doing the same thing, but what I see it as is two > passes, not something where they are working together. The hardware > GRO can rely on software GRO for a second pass, but it doesn't need > to. The fact that it doesn't need to tells me that it isn't a hard > requirement to have GRO in order to make use of software GRO.
I guess I look at this as feature propagation from net device to hardware. To me, it makes a lot of sense. We've been doing hardware GRO for a while and I never think of it as replacement for software GRO. It's hardware accelerated GRO for a subset of the connections that hardware can handle. As for the additional GRO pass in software, I think it is quite efficient. When hardware has aggregated a GRO frame, software GRO will effectively "flush" it and never hold it for more aggregation. After this patchset is done, I can look at the code and see if we can further optimize the "2nd pass" code path when hardware has already aggregated the packet. Anyway, I will move the GRO_HW/GRO dependency to the ndo_fix_features() of the 3 drivers, so we can move on and get these patches accepted. >> May be you meant put the RXCSUM check in the outer if statement so >> that someone could add more inner checks? OK, I think that's what you >> meant. > > Yes that is what I meant. OK. Will change in v4. >> We need patch #2 otherwise generic GRO won't work on these 3 drivers. >> I don't think I fully understand your concerns about propagation. To >> me propagation is just a usage model where an upper device will >> control the common features of lower devices. It is more convenient >> to have propagation, but requires upper devices to be aware of all >> features that propagate (GRO, RXCSUM). Without propagation, it is >> still fine. > > I'm not sure if it is. It depends on how much XDP depends on the > frames being non-linear. As-is I am pretty sure this doesn't work for > the stacked case anyway since GRO was still enabled for lower devices > anyway. So you might look at just modifying patch 2 to not worry about > the stacked devices case since I think that was already broken with > GRO anyway. OK. I don't think anyone will run generic XDP on an upper device anyway.