Re: [RFC] [GIT PATCH net-2.6.23] IPV6: Configurable IPv6 address selection policy table (RFC3484)

2007-04-29 Thread Ulrich Drepper
David Miller wrote:
> Something more scalable has to be used.

This is where the shared-memory based event notification comes in.  It
was always also meant to be used for things like this.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [RFC] [GIT PATCH net-2.6.23] IPV6: Configurable IPv6 address selection policy table (RFC3484)

2007-04-29 Thread David Miller
From: Ulrich Drepper <[EMAIL PROTECTED]>
Date: Sat, 28 Apr 2007 23:39:22 -0700

> David Miller wrote:
> > One idea is to have glibc have some kind of socket open, subscribed
> > to a group which gets "sticky" events.
> 
> I don't quite yet know the context but I have to intervene: keeping
> sockets open is not good.  This will only cause problems.
> 
> Any interface must be memory based.  Something like "register a word
> which is set when an event arrives" is a much better interface.  Who you
> then go and retrieve messages is another issue.  If this is a rare event
> then opening is new netlink socket is no problem.

That's ne excellent point, however my concern is that we are
accumulating lots of these things.

You can't load up the vsyscall page with a memory word for each and
every thing of this nature, for example.

Something more scalable has to be used.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] [GIT PATCH net-2.6.23] IPV6: Configurable IPv6 address selection policy table (RFC3484)

2007-04-28 Thread Ulrich Drepper
David Miller wrote:
> One idea is to have glibc have some kind of socket open, subscribed
> to a group which gets "sticky" events.

I don't quite yet know the context but I have to intervene: keeping
sockets open is not good.  This will only cause problems.

Any interface must be memory based.  Something like "register a word
which is set when an event arrives" is a much better interface.  Who you
then go and retrieve messages is another issue.  If this is a rare event
then opening is new netlink socket is no problem.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: [RFC] [GIT PATCH net-2.6.23] IPV6: Configurable IPv6 address selection policy table (RFC3484)

2007-04-28 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Thu, 19 Apr 2007 16:28:56 +0900 (JST)

> We store labels only in kernel, and leave precedence in userspace
> (/etc/gai.conf), so far.  The name resolution library (getaddrinfo(3))
> is required to be changed to try reading label information from kernel.
> On the other hand, on BSDs or on Solaris, full policy table including
> precedence seems to be stored in kernel, and the name resolution
> libary (getaddrinfo(3)) seems to use that information.
> We could choose this approach.
> 
> Note: Solaris uses string (up to 15 characters excluding NUL) labels.

As you mention the main problem is efficiently notifying
userspace that the table has changed in the same way that
file changes can be checked.

The last thing we want is for glibc to have to stat a bunch of files
every time it wants to do something, it does enough of that already
:-)

Probably, to start somewhere, it may be wise to put the entire
precedence table in the kernel just like BSD, Solaris, and your
patch do.  We can figure out how to make the update interface
efficient later, perhaps with something clever in netlink.

One idea is to have glibc have some kind of socket open, subscribed
to a group which gets "sticky" events.  It will be simple messages
such as "table of type X got updated".  If the socket already got
sent that message, on subsequent updates we wouldn't send it again
until glibc read the event message out.

It would be possible to not even use explicit messages.  Instead
some netlink socket state holds a generation counter, label
table updates increment the counter, and glibc just asks the
kernel via netlink whether it's generation count is out of date.
If so, the kernel returns true and also updates the generation
count for that socket to match the current one.

It is one idea.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [GIT PATCH net-2.6.23] IPV6: Configurable IPv6 address selection policy table (RFC3484)

2007-04-19 Thread YOSHIFUJI Hideaki / 吉藤英明
Hello.

This is RFC(*) for supporting configurable IPv6 address selection policy
table, which is described in RFC3484.

Corresponding userspace tool is available at

.


We store labels only in kernel, and leave precedence in userspace
(/etc/gai.conf), so far.  The name resolution library (getaddrinfo(3))
is required to be changed to try reading label information from kernel.
On the other hand, on BSDs or on Solaris, full policy table including
precedence seems to be stored in kernel, and the name resolution
libary (getaddrinfo(3)) seems to use that information.
We could choose this approach.

Note: Solaris uses string (up to 15 characters excluding NUL) labels.

At this moment, glibc does not reload /etc/gai.conf promptly by default.
According to getaddrinfo(3) manpage, getaddrinfo(3) does not seem
thread safe if we put "reload yes" in the configuration file (/etc/gai.conf).
We probably need to fix that.

Problems in current approach:

Currently when the getaddrinfo(3) tries to reload /etc/gai.conf,
it performs fstat to check if the file is updated.  However, procfs
always reports current time as modification time, so getaddrinfo(3)
will always need to reload procfs.  To further optimization we should
touch procfs subsystem.

Another issue in procfs is it is not atomic.
To solve this issue, we probably need to support netlink
interface.  However, I am not sure how we can optimize reading 
policy from kernel with this approach.


Another problem.  I put several new ioctls in include/linux/ipv6.h, but
I guess it is very hard to include that file from userspace... sigh...

TODOs: Probably we should use RCUs.


Comments / optimions welcome.


*: We do not expect this will be included in net-2.6.22,
but 2.6.23 or so.

Regrads,

-

HEADLINES
-

[IPV6] ADDRCONF: define and export constant for ::.
[IPV6] ADDRCONF: Prepare supporting source address selection policy with 
ifindex.
[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy 
table.

DIFFSTAT


 include/linux/in6.h|2 
 include/linux/ipv6.h   |   16 ++
 include/net/addrconf.h |4 
 include/net/ipv6.h |5 +
 net/ipv6/Makefile  |1 
 net/ipv6/addrconf.c|   50 ++---
 net/ipv6/addrlabel.c   |  454 
 net/ipv6/af_inet6.c|3 
 8 files changed, 498 insertions(+), 37 deletions(-)

CHANGESETS
--

commit 27bafd017775cffa86d60eea179b68c4b90c4ae7
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Tue Apr 3 00:12:49 2007 +0900

[IPV6] ADDRCONF: define and export constant for ::.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/include/linux/in6.h b/include/linux/in6.h
index d559fac..2a61c82 100644
--- a/include/linux/in6.h
+++ b/include/linux/in6.h
@@ -44,10 +44,8 @@ struct in6_addr
  * NOTE: Be aware the IN6ADDR_* constants and in6addr_* externals are defined
  * in network byte order, not in host byte order as are the IPv4 equivalents
  */
-#if 0
 extern const struct in6_addr in6addr_any;
 #define IN6ADDR_ANY_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 } } }
-#endif
 extern const struct in6_addr in6addr_loopback;
 #define IN6ADDR_LOOPBACK_INIT { { { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1 } } }
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47d3adf..371ee2f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -206,9 +206,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
 };
 
 /* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */
-#if 0
 const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT;
-#endif
 const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT;
 
 static void addrconf_del_timer(struct inet6_ifaddr *ifp)

---
commit ce50931887ad6bdf951f1b165bd76e1cda9adf97
Author: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>
Date:   Tue Apr 3 00:21:23 2007 +0900

[IPV6] ADDRCONF: Prepare supporting source address selection policy with 
ifindex.

Signed-off-by: YOSHIFUJI Hideaki <[EMAIL PROTECTED]>

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 371ee2f..c61fb62 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -831,7 +831,8 @@ static inline int ipv6_saddr_preferred(int type)
 }
 
 /* static matching label */
-static inline int ipv6_saddr_label(const struct in6_addr *addr, int type)
+static inline int ipv6_saddr_label(const struct in6_addr *addr, int type,
+  int ifindex)
 {
  /*
   *prefix (longest match)  label
@@ -866,7 +867,8 @@ int ipv6_dev_get_saddr(struct net_device *daddr_dev,
struct inet6_ifaddr *ifa_result = NULL;
int daddr_type = __ipv6_addr_type(daddr);
int daddr_scope = __ipv6_addr_src_scope(daddr_type);
-   u32 daddr_label = ipv6_saddr_label(daddr, daddr_type);
+   int daddr_ifindex = daddr_dev ? daddr_dev->ifindex : 0;
+   u32 daddr_label = ipv