On 05/30/2015 02:00 AM, Jiri Pirko wrote:
Fri, May 29, 2015 at 05:39:46PM CEST, sfel...@gmail.com wrote:
On Fri, May 29, 2015 at 12:50 AM, Jiri Pirko <j...@resnulli.us> wrote:
Thu, May 21, 2015 at 07:46:54AM CEST, sfel...@gmail.com wrote:
On Tue, May 19, 2015 at 1:28 PM, David Miller <da...@davemloft.net> wrote:
From: Andy Gospodarek <go...@cumulusnetworks.com>
Date: Tue, 19 May 2015 15:47:32 -0400

Are you actually saying that if users complain loudly enough about
the current behavior (not the change Roopa has proposed) that you
would be open to considering a change the current behavior?

I am saying that we have a contract with users not to break existing
behavior.  Full stop.

After rehearing David's argument, we should probably explore option d)
which is a refinement on the fib_offload_disable mechanism we have
today.  fib_offload_disable is global for all routes.  Once we hit a
HW install problem, the global flag is set and all routes fallback to
SW.  We did this because we can't allow the failed route to exist in
SW and not in HW because it could mess up LPM searches (HW could hit
on a lesser prefix even when SW has the true LPM, because HW gets
first shot at match).  The refinement on fib_offload_disable is this:
make it per-related-prefix rather than global, and on a HW install
problem, set the flag for the related-prefix and uninstall only those
routes from HW.  Related-prefix (is there a correct term for this?)
are routes to the same dst addr but with different prefix lengths.  I
haven't parsed the fib_trie structure to see how routes are organized,
but I suspect since it's optimized for lookup the related-prefix
tracking is already there and we can build on that.

This looks interesting. However, I'm not sure that it is acceptable for
user to experience this hw evict of "random entries". User knows what
entries are essential to have in hw. With your solution, I can see no way
user can actually say what should be offloaded or not. Kernel just
automagically decides.

The default eviction policy could be based on RTA_PRIORITY: evict
lower priority routes first.  It would be up to the device driver to
decide between two routes of same priority.

To help device driver make the decision, we could have eviction policy options:

    Priority-base (default)
    Prefer IPv6 over IPv4
    Prefer IPv4 over IPv6
    Prefer single path over multipath
    Prefer longer prefix lengths over shorter
    Optimize for resource utilization

These are portable across different switches.   They're in terms a
user understands.  It's up to the device driver which truly
understands the device constraints to translates the user's eviction
policy choices into something that makes sense to that device.

This sounds tempting... You plan to throw in some patches, or should I
take care of that?


This is encoding specific policies into the kernel. I was hoping to
avoid this and let user space develop whatever policy it wants. If you
use Jiri's proposed NLM_F_SKIP_{KERNEL|OFFLOAD} flags you get this.

Also I don't understand the "truly  understands the device constraints"
comment. We can export a model of the device and know how many rules
of each type will fit exactly into the table. This doesn't seem like
much of a problem to me. In fact the driver developer should know this
anyway.

Part of my motivation here is I really don't want to get stuck with a
case where each driver writer gets to translate the eviction policy
onto their device in some device specific and slightly different way.
It means every developer has to write a new mapping and get it correct.
At very least we should put a layer in switchdev that reads the table
out of the driver and does the mapping so we have it one spot. At least
then the kernel is enforcing policy the same on all devices. Better
still IMO would be to develop the policy in user space and have a
library/tool that does this so we don't end up with a bunch of policy
blobs in the kernel. The 6 above is a good start but over time we more
policy blobs will surely pop up. I would for example put 'optimize for
throughput' on the list.

.John

--
John Fastabend         Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to