>>> Then again, if you're basically saying every HW-assisted offload on >>> receive should be done under LRO flag, what would be the use case >>> where a GRO-assisted offload would help? >>> I.e., afaik LRO is superior to GRO in `brute force' - >>> it creates better packed packets and utilizes memory better >>> [with all the obvious cons such as inability for defragmentation]. >>> So if you'd have the choice of having an adpater perform 'classic' >>> LRO aggregation or something that resembles a GRO packet, >>> what would be the gain from doing the latter?
> LRO and GRO shouldn't really differ in packing or anything like that. > The big difference between the two is that LRO is destructive while > GRO is not. Specifically in the case of GRO you should be able to > take the resultant frame, feed it through GSO, and get the original > stream of frames back out. So you can pack the frames however you > want the only key is that you must capture all the correct offsets and > set the gso_size correct for the flow. While the implementation might lack in things [such as issues with future implementation], following your logic it is GRO - I.e., forwarding scenarios work fine with HW assisted GRO. >> Just to relate to bnx2x/qede differences in current implementation - >> when this GRO hw-offload was added to bnx2x, it has already >> supported classical LRO, and due to above statement whenever LRO >> was set driver aggregated incoming traffic as classic LRO. >> I agree that in hindsight the lack of distinction between sw/hw GRO >> was hurting us. > In the case of bnx2x it sounds like you have issues that are > significantly hurting the performance versus classic software GRO. If > that is the case you might want to simply flip the logic for the > module parameter that Rick mentioned and just disable the hardware > assisted GRO unless it is specifically requested. A bit hard to flip; The module parameter also disables LRO support. And given that module parameters is mostly a thing of the past, I don't think we should strive fixing things through additional changes in that area. > > qede isn't implementing LRO, so we could easily mark this feature > > under LRO there - but question is, given that the adapter can support > > LRO, if we're going to suffer from all the shotrages that arise from > > putting this feature under LRO, why should we bother? > The idea is to address feature isolation. The fact is the hardware > exists outside of kernel control. If you end up linking an internal > kernel feature to your device like this you are essentially stripping > the option of using the kernel feature. > I would prefer to see us extend LRO to support "close enough GRO" > instead of have us extend GRO to also include LRO. Again - why? What's the benefit of HW doing LRO and trying to control imitate GRO, if it's still carrying all the LRO baggage [specifically, disabling it on forwarding] as opposed to simply doing classic LRO? > > You can argue that we might need a new feature bit for control > > over such a feature; If we don't do that, is there any gain in all of this? > I would argue that yes there are many cases where we will be able to > show gain. The fact is there is a strong likelihood of the GRO on > your parts having some differences either now, or at some point in the > future as the code evolves. As I mentioned there was already some > talk about possibly needing to push the UDP tunnel aggregation out of > GRO and perhaps handling it sometime after IP look up had verified > that the destination was in fact a local address in the namespace. In > addition it makes the changes to include the tunnel encapsulation much > more acceptable as LRO is already naturally dropped in the routing and > bridging cases if I recall correctly. I think it all boils down to the question of "do we actually want to have HW-assisted GRO?". If we do [and not necessarily for the UDP-tunnel scenario] then we need to have it distinct from LRO, otherwise there's very little gain. If we believe tGRO should remain SW-only, then I think the discussion is mott; We need to stop trying this, and offload only LRO - in which case we can aggregate it in whichever 'destructive' [correct] format we like, without trying to have it resemble GRO.