Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: Ben Greear wrote: With this patch applied everything is looking much better. I currently have 400+ interfaces and one routing table per interface, and traffic is passing as expected. This is probably due to my own application polling interfaces for stat updates...but I am seeing over 50% usage (with more system than user-space) in this setup on an otherwise lightly loaded system. top shows no process averaging more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find a little strange. load is around 3.0. I can't imagine this beeing related to the increased number of routing tables, with a number of entries slightly (not even two times) over the hash size it shouldn't make that much of a difference. It may of course be a bug, but I don't see it. I think it was my polling logic that was the problem. I fixed it up to be more clever and the load went away. Ben - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Ben Greear wrote: > With this patch applied everything is looking much better. I currently > have 400+ interfaces and one routing table per interface, and traffic > is passing as expected. > > This is probably due to my own application polling interfaces for > stat updates...but I am seeing over 50% usage (with more system than > user-space) > in this setup on an otherwise lightly loaded system. top shows no > process averaging > more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find > a little strange. load is around 3.0. I can't imagine this beeing related to the increased number of routing tables, with a number of entries slightly (not even two times) over the hash size it shouldn't make that much of a difference. It may of course be a bug, but I don't see it. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
David Miller wrote: > Nice work Patrick. > > You guys have a lot of time to flesh out any remaining issues and > failures, and then submit this for 2.6.19 Will do, I already expected to miss the deadline :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: Ben Greear wrote: Patrick McHardy wrote: I took on Ben's challenge to increase the number of possible routing tables, these are the resulting patches. I am seeing problems..though they could be with the way I'm using the tool or pehaps I patched the kernel incorrectly. I applied the 3 patches to 2.6.17..all patches applied without problem, but with a few lines of fuzz. I get the same behaviour with and without the new 'ip' patches applied. If I do an 'ip ru show', then I see lots of tables, though not all it seems. (I have not tried beyond 205 yet). But, if I do an 'ip route show table XX', then I see nothing or incorrect values. My patches introduced a bug when dumping tables which could lead to incorrect routes beeing dumped. A second bug (that already existed) makes the kernel fail when dumping more rules than fit in a skb. I think I've already seen the patch to address the second problem a short time ago sent by someone else. Anyway, this patch should fix both. With this patch applied everything is looking much better. I currently have 400+ interfaces and one routing table per interface, and traffic is passing as expected. This is probably due to my own application polling interfaces for stat updates...but I am seeing over 50% usage (with more system than user-space) in this setup on an otherwise lightly loaded system. top shows no process averaging more than about 2% CPU (and only 2-3 are above 0.0 typically), which I find a little strange. load is around 3.0. I'll dig into my code and see if I can tune the stat-gathering logic a bit... Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
From: Patrick McHardy <[EMAIL PROTECTED]> Date: Fri, 07 Jul 2006 21:58:31 +0200 > My patches introduced a bug when dumping tables which could lead to > incorrect routes beeing dumped. A second bug (that already existed) > makes the kernel fail when dumping more rules than fit in a skb. > I think I've already seen the patch to address the second problem > a short time ago sent by someone else. Anyway, this patch should > fix both. Nice work Patrick. You guys have a lot of time to flesh out any remaining issues and failures, and then submit this for 2.6.19 Thanks again. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Ben Greear wrote: > Patrick McHardy wrote: > >>> I took on Ben's challenge to increase the number of possible routing >>> tables, these are the resulting patches. > > > I am seeing problems..though they could be with the way I'm using the tool > or pehaps I patched the kernel incorrectly. > > I applied the 3 patches to 2.6.17..all patches applied without problem, > but with a few lines of fuzz. I get the same behaviour with and > without the new 'ip' patches applied. > > If I do an 'ip ru show', then I see lots of tables, though not all it > seems. (I have not tried beyond 205 yet). But, if I do an > 'ip route show table XX', then I see nothing or incorrect values. My patches introduced a bug when dumping tables which could lead to incorrect routes beeing dumped. A second bug (that already existed) makes the kernel fail when dumping more rules than fit in a skb. I think I've already seen the patch to address the second problem a short time ago sent by someone else. Anyway, this patch should fix both. diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 3c49e6b..6e1aaa4 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -357,6 +357,7 @@ int inet_dump_fib(struct sk_buff *skb, s unsigned int e = 0, s_e; struct fib_table *tb; struct hlist_node *node; + int dumped = 0; if (NLMSG_PAYLOAD(cb->nlh, 0) >= sizeof(struct rtmsg) && ((struct rtmsg*)NLMSG_DATA(cb->nlh))->rtm_flags&RTM_F_CLONED) @@ -365,16 +366,17 @@ int inet_dump_fib(struct sk_buff *skb, s s_h = cb->args[0]; s_e = cb->args[1]; - for (h = s_h; h < FIB_TABLE_HASHSZ; h++) { + for (h = s_h; h < FIB_TABLE_HASHSZ; h++, s_e = 0) { e = 0; hlist_for_each_entry(tb, node, &fib_table_hash[h], tb_hlist) { if (e < s_e) goto next; - if (e > s_e) - memset(&cb->args[1], 0, sizeof(cb->args) - + if (dumped) + memset(&cb->args[2], 0, sizeof(cb->args) - 2 * sizeof(cb->args[0])); if (tb->tb_dump(tb, skb, cb) < 0) goto out; + dumped = 1; next: e++; } diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c index a41ab4b..6f33f12 100644 --- a/net/ipv4/fib_rules.c +++ b/net/ipv4/fib_rules.c @@ -459,13 +459,13 @@ int inet_dump_rules(struct sk_buff *skb, rcu_read_lock(); hlist_for_each_entry(r, node, &fib_rules, hlist) { - if (idx < s_idx) - continue; + goto next; if (inet_fill_rule(skb, r, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, RTM_NEWRULE, NLM_F_MULTI) < 0) break; +next: idx++; } rcu_read_unlock();
Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: Patrick McHardy wrote: I took on Ben's challenge to increase the number of possible routing tables, these are the resulting patches. I am seeing problems..though they could be with the way I'm using the tool or pehaps I patched the kernel incorrectly. I applied the 3 patches to 2.6.17..all patches applied without problem, but with a few lines of fuzz. I get the same behaviour with and without the new 'ip' patches applied. If I do an 'ip ru show', then I see lots of tables, though not all it seems. (I have not tried beyond 205 yet). But, if I do an 'ip route show table XX', then I see nothing or incorrect values. For my test, I am creating 200 virtual interfaces (mac-vlans in my case, but 802.1q should work equally well.) I am giving them all IP addrs on the same subnet, and a routing table for each source IP addr. The commands I run to generate the routing tables are found in this file: http://www.candelatech.com/oss/gc.txt When I change back to kernel 2.6.16.16 with only my patchset applied, things seem to be working, so it looks like an issue with the new kernel patches. I can provide access to this machine as well as my full patch set, etc... For whatever reason, table 5 does appear in a bizarre fashion: [EMAIL PROTECTED] lanforge]$ more ~/tmp/ip.txt [EMAIL PROTECTED] lanforge]# ip route show table 5 10.1.2.0/24 via 10.1.2.2 dev eth1#0 default via 10.1.2.1 dev eth1#0 [EMAIL PROTECTED] lanforge]# ip route show table 4 [EMAIL PROTECTED] lanforge]# ip route show table 3 [EMAIL PROTECTED] lanforge]# ip route show table 2 [EMAIL PROTECTED] lanforge]# ip route show table 1 [EMAIL PROTECTED] lanforge]# ip route show table 0 10.1.2.0/24 via 10.1.2.2 dev eth1#0 table 5 default via 10.1.2.1 dev eth1#0 table 5 # Here is a listing of 'ip ru show'. [EMAIL PROTECTED] lanforge]$ more ~/tmp/ru.txt 0: from all lookup local 31203: from 10.1.2.144 lookup 147 31204: from 10.1.2.143 lookup 146 31205: from 10.1.2.142 lookup 145 31206: from 10.1.2.141 lookup 144 31207: from 10.1.2.140 lookup 143 31208: from 10.1.2.139 lookup 142 31209: from 10.1.2.138 lookup 141 31210: from 10.1.2.137 lookup 140 31211: from 10.1.2.136 lookup 139 31212: from 10.1.2.135 lookup 138 31213: from 10.1.2.134 lookup 137 31214: from 10.1.2.133 lookup 136 31215: from 10.1.2.132 lookup 135 31216: from 10.1.2.131 lookup 134 31217: from 10.1.2.130 lookup 133 31218: from 10.1.2.129 lookup 132 31219: from 10.1.2.128 lookup 131 31220: from 10.1.2.127 lookup 130 31221: from 10.1.2.126 lookup 129 31222: from 10.1.2.125 lookup 128 31223: from 10.1.2.124 lookup 127 31224: from 10.1.2.123 lookup 126 31225: from 10.1.2.122 lookup 125 31226: from 10.1.2.121 lookup 124 31227: from 10.1.2.120 lookup 123 31228: from 10.1.2.119 lookup 122 31229: from 10.1.2.118 lookup 121 31230: from 10.1.2.117 lookup 120 31231: from 10.1.2.116 lookup 119 31232: from 10.1.2.115 lookup 118 31233: from 10.1.2.114 lookup 117 31234: from 10.1.2.113 lookup 116 31235: from 10.1.2.201 lookup 204 31236: from 10.1.2.200 lookup 203 31237: from 10.1.2.199 lookup 202 31238: from 10.1.2.198 lookup 201 31239: from 10.1.2.197 lookup 200 31240: from 10.1.2.196 lookup 199 31241: from 10.1.2.195 lookup 198 31242: from 10.1.2.112 lookup 115 31243: from 10.1.2.111 lookup 114 31244: from 10.1.2.110 lookup 113 31245: from 10.1.2.109 lookup 112 31246: from 10.1.2.108 lookup 111 31247: from 10.1.2.107 lookup 110 31248: from 10.1.2.106 lookup 109 31249: from 10.1.2.105 lookup 108 31250: from 10.1.2.104 lookup 107 31251: from 10.1.2.103 lookup 106 31252: from 10.1.2.102 lookup 105 31253: from 10.1.2.101 lookup 104 31254: from 10.1.2.100 lookup 103 31255: from 10.1.2.99 lookup 102 31256: from 10.1.2.98 lookup 101 31257: from 10.1.2.97 lookup 100 31258: from 10.1.2.96 lookup 99 31259: from 10.1.2.95 lookup 98 31260: from 10.1.2.94 lookup 97 31261: from 10.1.2.93 lookup 96 31262: from 10.1.2.92 lookup 95 31263: from 10.1.2.91 lookup 94 31264: from 10.1.2.90 lookup 93 31265: from 10.1.2.89 lookup 92 31266: from 10.1.2.88 lookup 91 31267: from 10.1.2.87 lookup 90 31268: from 10.1.2.86 lookup 89 31269: from 10.1.2.85 lookup 88 31270: from 10.1.2.84 lookup 87 31271: from 10.1.2.83 lookup 86 31272: from 10.1.2.82 lookup 85 31273: from 10.1.2.81 lookup 84 31274: from 10.1.2.80 lookup 83 31275: from 10.1.2.79 lookup 82 Thanks, Ben -- Ben Greear <[EMAIL PROTECTED]> Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: > I took on Ben's challenge to increase the number of possible routing tables, > these are the resulting patches. > > The table IDs are changed to 32 bit values and are contained in a new netlink > routing attribute. For compatibility rtm_table in struct rtmsg can still be > used to access the first 255 tables and contains the low 8 bit of the table > ID in case of dumps. Unfortunately there are no invalid values for rtm_table, > so the best userspace can do in case of a new iproute version that tries to > access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table, > which will make the kernel allocate an empty table instead of silently adding > routes to a more or less random table. The iproute patch will follow shortly. > > The hash tables are statically sized since on-the-fly resizing would require > introducing locking in the packet processing path (currently we need none), > if this is a problem we could just directly attach table references to rules, > since tables are never deleted or freed this would be a simple change. > > One spot is still missing (nl_fib_lookup), so these patches are purely a RFC > for now. Tested only with IPv4, I mainly converted DECNET as well to keep it > in sync and because iteration over all possible table values, as done in many > spots, has an unacceptable overhead with 32 bit values. Since there were no objections, I would like to finalize this patch by takeing care of nl_fib_lookup. Since it was introduced as a debugging interface for fib_trie and the interface definitions are not even public (contained in include/net), I wonder if anyone really cares about backwards compatibility or if I can just change it. Robert, Thomas, you are the only two users of the interface I'm aware of, what do you think? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
* Patrick McHardy <[EMAIL PROTECTED]> 2006-07-03 13:36 > They will as long as this feature isn't used, the RTA_TABLE > attribute is only added to the message when the table id > is > 255. Worked fine during my tests, or are you refering > to something else? Perfect, I said nothing :) - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Thomas Graf wrote: > * Patrick McHardy <[EMAIL PROTECTED]> 2006-07-03 11:38 > >>That wasn't entirely true either, its not inet_check_attr but >>rtnetlink_rcv_message that aborts, and it does this on all >>kernels. Somehow I thought unknown attributes were usually >>ignored .. > > > This only applies to the first level of rtnetlink attributes, > when using rtattr_parse() unknown attributes are ignored. > > Once this ugly rta_buf has disappeared it will become more > consistent. > > Patches look good to me except that new iproute binaries > won't work with older kernels anymore? They will as long as this feature isn't used, the RTA_TABLE attribute is only added to the message when the table id is > 255. Worked fine during my tests, or are you refering to something else? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
* Patrick McHardy <[EMAIL PROTECTED]> 2006-07-03 11:38 > That wasn't entirely true either, its not inet_check_attr but > rtnetlink_rcv_message that aborts, and it does this on all > kernels. Somehow I thought unknown attributes were usually > ignored .. This only applies to the first level of rtnetlink attributes, when using rtattr_parse() unknown attributes are ignored. Once this ugly rta_buf has disappeared it will become more consistent. Patches look good to me except that new iproute binaries won't work with older kernels anymore? - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: > Patrick McHardy wrote: > >>I took on Ben's challenge to increase the number of possible routing tables, >>these are the resulting patches. >> >>The table IDs are changed to 32 bit values and are contained in a new netlink >>routing attribute. For compatibility rtm_table in struct rtmsg can still be >>used to access the first 255 tables and contains the low 8 bit of the table >>ID in case of dumps. Unfortunately there are no invalid values for rtm_table, >>so the best userspace can do in case of a new iproute version that tries to >>access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table, >>which will make the kernel allocate an empty table instead of silently adding >>routes to a more or less random table. The iproute patch will follow shortly. > > > Actually that last part wasn't entirely true. The last couple of > releases of the kernel include the inet_check_attr function, > which (unwillingly) breaks with the tradition of ignoring > unknown attributes and signals an error on receiving the RTA_TABLE > attribute. So the iproute patch only includes the RTA_TABLE > attribute when the table ID is > 255, in which case rtm_table > is set to RT_TABLE_UNSPEC. Old kernels will still have the > behaviour I described above. The patch has been tested to > behave as expected on both patched and unpatched kernels. That wasn't entirely true either, its not inet_check_attr but rtnetlink_rcv_message that aborts, and it does this on all kernels. Somehow I thought unknown attributes were usually ignored .. anyway, this is a good thing in this case as it will avoid unexpected behaviour and simply return an error on kernels where this feature is not available. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC NET 00/04]: Increase number of possible routing tables
Patrick McHardy wrote: > I took on Ben's challenge to increase the number of possible routing tables, > these are the resulting patches. > > The table IDs are changed to 32 bit values and are contained in a new netlink > routing attribute. For compatibility rtm_table in struct rtmsg can still be > used to access the first 255 tables and contains the low 8 bit of the table > ID in case of dumps. Unfortunately there are no invalid values for rtm_table, > so the best userspace can do in case of a new iproute version that tries to > access tables > 255 on an old kernel is to use RTM_UNSPEC (0) for rtm_table, > which will make the kernel allocate an empty table instead of silently adding > routes to a more or less random table. The iproute patch will follow shortly. Actually that last part wasn't entirely true. The last couple of releases of the kernel include the inet_check_attr function, which (unwillingly) breaks with the tradition of ignoring unknown attributes and signals an error on receiving the RTA_TABLE attribute. So the iproute patch only includes the RTA_TABLE attribute when the table ID is > 255, in which case rtm_table is set to RT_TABLE_UNSPEC. Old kernels will still have the behaviour I described above. The patch has been tested to behave as expected on both patched and unpatched kernels. diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 5e33a20..7573c62 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -238,9 +238,8 @@ enum rt_class_t RT_TABLE_DEFAULT=253, RT_TABLE_MAIN=254, RT_TABLE_LOCAL=255, - __RT_TABLE_MAX }; -#define RT_TABLE_MAX (__RT_TABLE_MAX - 1) +#define RT_TABLE_MAX 0x @@ -263,6 +262,7 @@ enum rtattr_type_t RTA_CACHEINFO, RTA_SESSION, RTA_MP_ALGO, + RTA_TABLE, __RTA_MAX }; diff --git a/include/rt_names.h b/include/rt_names.h index 2d9ef10..07a10e0 100644 --- a/include/rt_names.h +++ b/include/rt_names.h @@ -5,7 +5,7 @@ #include char* rtnl_rtprot_n2a(int id, char *buf, int len); char* rtnl_rtscope_n2a(int id, char *buf, int len); -char* rtnl_rttable_n2a(int id, char *buf, int len); +char* rtnl_rttable_n2a(__u32 id, char *buf, int len); char* rtnl_rtrealm_n2a(int id, char *buf, int len); char* rtnl_dsfield_n2a(int id, char *buf, int len); int rtnl_rtprot_a2n(__u32 *id, char *arg); diff --git a/ip/ip_common.h b/ip/ip_common.h index 1fe4a69..8b286b0 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -32,4 +32,12 @@ extern int do_multiaddr(int argc, char * extern int do_multiroute(int argc, char **argv); extern int do_xfrm(int argc, char **argv); +static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb) +{ + __u32 table = r->rtm_table; + if (tb[RTA_TABLE]) + table = *(__u32*) RTA_DATA(tb[RTA_TABLE]); + return table; +} + extern struct rtnl_handle rth; diff --git a/ip/iproute.c b/ip/iproute.c index a43c09e..4ebe617 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -75,7 +75,8 @@ static void usage(void) static struct { - int tb; + __u32 tb; + int cloned; int flushed; char *flushb; int flushp; @@ -125,6 +126,7 @@ int print_route(const struct sockaddr_nl inet_prefix prefsrc; inet_prefix via; int host_len = -1; + __u32 table; SPRINT_BUF(b1); @@ -151,27 +153,23 @@ int print_route(const struct sockaddr_nl host_len = 80; if (r->rtm_family == AF_INET6) { + if (filter.cloned) { + if (!(r->rtm_flags&RTM_F_CLONED)) + return 0; + } if (filter.tb) { - if (filter.tb < 0) { - if (!(r->rtm_flags&RTM_F_CLONED)) - return 0; - } else { - if (r->rtm_flags&RTM_F_CLONED) + if (r->rtm_flags&RTM_F_CLONED) + return 0; + if (filter.tb == RT_TABLE_LOCAL) { + if (r->rtm_type != RTN_LOCAL) return 0; - if (filter.tb == RT_TABLE_LOCAL) { - if (r->rtm_type != RTN_LOCAL) - return 0; - } else if (filter.tb == RT_TABLE_MAIN) { - if (r->rtm_type == RTN_LOCAL) - return 0; - } else { + } else if (filter.tb == RT_TABLE_MAIN) { + if (r->rtm_type == RTN_LOCAL) return 0; - } + } else { + return 0; }