On Wed, 2006-05-10 at 13:52 +0200, Eliot Lear wrote: > If I understand the problem correctly, no matter whether you prefer IPv4 > or IPv6 the presumption that is failing us is that if the interface has > an A or AAAA record associated with it then it is reachable on that > address.
Er, no. Unless I misunderstand what you're saying, it had nothing to do with the problem at all. :) You seem to be talking about whether there's an A or AAAA record referring to the _local_ address? Without doing reverse and then forward DNS lookups, I don't even see how we could establish such a thing. It wasn't what I meant. While I quite like the idea that one's networking should completely fail unless one has valid reverse and forward DNS, it's not really relevant to what I was talking about :) The problem is this: I have a local host with IPv4 address 192.168.x.x and IPv6 address fec0::YYYY. IPv4 is routed to the outside world via NAT, as you would expect. IPv6 from that site-local address is not routed to the outside world -- as you would also expect. I use glibc's getaddrinfo() to fetch addresses for an external host, for example olpc1.infradead.org (2001:8b0:10b:1:2e0:4cff:fe03:3d7a, 81.187.226.98). Glibc's implementation of RFC3484 is described at http://people.redhat.com/drepper/linux-rfc3484.html -- it uses SOCK_DGRAM connect() and then getsockname() to determine the local address which would be used for any given remote address. On older Linux kernels, glibc's connect() from fec0::YYYY to 2001:8b0:10b:1:2e0:4cff:fe03:3d7a would fail. Thus, glibc would return the IPv4 address 81.187.226.98 first. And all would be well. On current Linux kernels, the connect() succeeds. Glibc now returns the IPv6 address 2001:8b0:10b:1:2e0:4cff:fe03:3d7a first in its results, and the connection does not succeed. This causes problems: 1. Because some software is _still_ buggy and gives up after the first address doesn't work, without falling back to the next. 2. Because sometimes the routers don't respond with an unreachable ICMP so we end up with a three-minute timeout before it connects. 3. Because sometimes the routers _do_ respond with an unreachable ICMP and this causes the kernel to drop its default route entirely rather than just learning that it can't reach _that_ host. So all internal IPv6 further than the local network also breaks. OK, all these three are bugs which need to get fixed. And I haven't investigated #2 and #3 very hard yet but I've seen them both. Nevertheless, the upshot of this is that I cannot run radvd on these networks -- I would get lynched. It's arguably correct that the connect() succeeds on the newer kernels. It might not often make much sense to connect() from site-local to global addresses, but if the user explicitly tries that then the kernel shouldn't reject the attempt out of hand. So glibc was behaving optimally in this case purely be coincidence, in the past. The best fix for this, IMHO, is to change the default label/precedence table in glibc's RFC3484 implementation so that it includes a separate label for site-local addresses. That will mean that in the above situation, glibc favours the IPv4 address by virtue of RFC3484 Rule 5 (where before it was favouring it by Rule 1). In the meantime, it's been "fixed" in glibc simply by changing the default precedence for ::ffff:0:0/96 to '100' -- thus favouring IPv4 over IPv6 in _all_ cases. I think it's best to avoid that situation, because it means that IPv6 is never going to get tested in Linux on dual-stack hosts; they'd always use IPv4 instead. -- dwmw2 -------------------------------------------------------------------- IETF IPv6 working group mailing list ipv6@ietf.org Administrative Requests: https://www1.ietf.org/mailman/listinfo/ipv6 --------------------------------------------------------------------