On 01/12/12 14:54, Gene Czarcinski wrote:
On 11/30/2012 05:23 PM, Gene Czarcinski wrote:
On 11/30/2012 04:18 PM, Simon Kelley wrote:
On 30/11/12 21:03, Gene Czarcinski wrote:
On 11/30/2012 12:45 PM, Simon Kelley wrote:
On 30/11/12 17:20, Gene Czarcinski wrote:
On 11/30/2012 11:32 AM, Simon Kelley wrote:
On 30/11/12 15:54, Gene Czarcinski wrote:
On 11/29/2012 04:18 PM, Simon Kelley wrote:
On 29/11/12 20:31, Gene Czarcinski wrote:

I spoke too quickly.

The cause of the problem is libvirt related but I am not sure
what
just
yet.

I was running a libvirt that had a lot of "stuff" on it but
seemed to
work OK. Then, earlier today I update to a point that appears
to be
somewhat beyond the leading edge and, although I was not
getting any
RTR-ADVERT messages, it turned out that there were/are big-time
problems
running qemu-kvm. So, back off/downgrade to the previous version.
Qemu-kvm now works but the RTR-ADVERT messages are back.

This may be a bit time-consuming to debug!

Are you seeing the new log message in netlink.c?


The good news is that libvirt is working again (I must have done a
git-pull in the middle of an update).  Thus, I am not seeing the
large
numbers of RTR-ADVERT.

Yes, I am seeing the new log message and I have a question about
that.
Every time a new virtual network interface is started, something
must be
doing some type of broadcast because all of the dnsmasq
instances (the
new one and all the "old" ones) suddenly wake up and issue a
flurry of
RA packets and related syslog messages.  To kick the flurry off,
there
one of the new "unsolicited" syslog messages from each dnsmasq
instance.

Is this something you would expect?  Is this "normal?" The libvirt
folks they are not doing it.
I'd expect it. The code you instrumented gets run whenever a "new
address" event happens, which is whenever an address is added to an
interface. "Every time a new virtual network interface is
started" is a
good proxy for that.

The dnsmasq code isn't very discriminating, it updates it's idea of
which interfaces hace which addresses, and then does a minute of
fast
advertisements on all of them. It might be possible to only do
the fast
advertisements on new interfaces, but implementing that isn't
totally
trivial.


Yes, I doubt very much if it would be trivial.  However, I do not
believe that this is the basic problem.

When the problem occurs, one of the networks "suddenly" attempts
to work
with the real NIC rather than the virtual one defined in its config
file.  I slightly changed the IPv4 and IPv6 addresses defined for
this
network and the problem went away.  I have also "just" seen the
problem
happen on another system which also had that virtual address defined.

BTW, these configurations all use interface= and bind-dynamic rather
than the "old" bind-interface with listen-address= specified for each
specified IPv4 and IPv6 address.  I had not noticed the problem
previously.  Why it occurs at all with just this specific address is
puzzling.

The configuration in which causes problems is:
------------------------------------------
# dnsmasq conf file created by libvirt
strict-order
domain-needed
domain=net6
expand-hosts
local=/net6/
pid-file=/var/run/libvirt/network/net6.pid
bind-dynamic
interface=virbr11
dhcp-range=192.168.6.128,192.168.6.254
dhcp-no-override
dhcp-leasefile=/var/lib/libvirt/dnsmasq/net6.leases
dhcp-lease-max=127
dhcp-hostsfile=/var/lib/libvirt/dnsmasq/net6.hostsfile
addn-hosts=/var/lib/libvirt/dnsmasq/net6.addnhosts
dhcp-range=fd00:beef:10:6::1,ra-only
-------------------------------------------------

When I changed all the "6" to "160", the problem, disappeared. And
there is another network defined almost the same with "8" instead
of "6"
and I have had no problems with it.

The real NIC is configured as a DHCP client  for both IPv4 and
IPv6. It
is assigned "nailed" addresses of 192.168.17.2/24 and
fd00:dead:beef:17::2.

And I just discovered why crazy stuff is happening (but I do not know
what causes it) ... the P33p1 NIC has:
   inet6 fd00:beef:10:6:3285:a9ff:fe8f:e982/64 scope global dynamic

Is that the "real NIC"?

Yes, p33p1 is the real NIC.  This is going to be a real PITA to debug
because I believe part of the problem is a race condition.
NetworkManager has this really long dance it goes through to bring up
the IPv6 interface.

But, I do not have any proof of that and as I just proved to myself,
getting things to repeat are going to be difficult.

At this point I am not sure that bind-dynamic was related.  I went
through the syslogs I still have and the first occurrence was on  8
November.  That is well before bind-dynamic was integrated in.

Attached are some limited copies of syslogs that I thought you might
find of interest.  It seems like the "strangeness" seem to happen right
after I update libvirt and libvirtd is restarted which then gets
dnsmasq
started.

If I cannot get this figured out and "fixed", I will need to disable
use
of dnsmasq for RA service and fall back on radvd.

Frustrating .. so close and yet so far!


I wonder if the virbr* interfaces are bridged to the "real" NICs,
such that when a prefix is advertised on the virbr interface, it
causes the real interface to add an address for that prefix. Because
dnsmasq is configured to advertise the prefix, that then causes the
advertisements via the real NIC.

Just a thought.

If I had not done the ip addr to get the above, I would still be
scratching my head.

Anyway, here is ip addr:
-----------------------------------------------
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: p33p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP qlen 1000
    link/ether 30:85:a9:8f:e9:82 brd ff:ff:ff:ff:ff:ff
    inet 192.168.17.2/24 brd 192.168.17.255 scope global p33p1
    inet6 fd00:dead:beef:17:1::2/128 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::3285:a9ff:fe8f:e982/64 scope link
       valid_lft forever preferred_lft forever
10: virbr11: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc
noqueue state DOWN
    link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
    inet 192.168.6.1/24 brd 192.168.6.255 scope global virbr11
    inet6 fd00:beef:10:6::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe0b:845c/64 scope link
       valid_lft forever preferred_lft forever
11: virbr11-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast
master virbr11 state DOWN qlen 500
    link/ether 52:54:00:0b:84:5c brd ff:ff:ff:ff:ff:ff
------------------------------------

And here is brctl show:
-----------------------------------------
bridge name    bridge id        STP enabled    interfaces
virbr11        8000.5254000b845c    yes        virbr11-nic
---------------------------------------

I think I will give it a rest until tomorrow!

I ran yet another test and this time it happened.  I am getting a little
more info such as:
Dec  1 05:52:36 falcon dnsmasq-dhcp[23358]: ra_start_unsolicted(),
len=60, type=14, flags=0, pid=5b5e

where most look like:
Dec  1 05:52:37 falcon dnsmasq-dhcp[23394]: ra_start_unsolicted(),
len=64, type=14, flags=0, pid=0


1. Is there other information I can/should print out?

No, it's clear what's happening, I think.

2. Is there anything I can do to identify why dnsmasq is "suddenly"
using interface p33p1 when it was specifically configured to use
interface virbr11?  I thought that this bind-interface and bind-dynamic
were support to lock that dnsmasq instance into only servicing the
specified interface.


I think that's known: it's because p33pl gets an address on fd00:beef:10:6:: network, which dnsmasq is configured to advertise.
The difficult question is why it's getting that address.

(The fd00:beef:10:6:: addresss on p33pl is not shown in your latest dump, but is is shown in earlier ones. The existance of that address on p33pl should be a big red flag when you're trying to diagnose this.)

Previously, libvirt's parameters to dnsmasq were bind-interface,
listen-address=.  This is now replaced with bind-dynamic, interface= to
fix a serious problem.  So, my question is whether the "correct"
configuration should be bind-interface, interface= ?

The object is that no matter haw it is specified, the goal is that
dnsmasq ONLY service the networks defined on a specific interface and to
ignore anything from other interfaces whether that have the same network
defined or not.

The RA code uses address matching to decide where to advertise. This could arguably be augmented by filtering on --interface, but it isn't at the moment.



Question, what do you think would happen if there were to different
processes on two different hardware platforms which share a common
hardware-network fabric were to run stateful RA on one system and
stateless RA on the other system?  Could that be happening here and, if
so, why?

I don't understand this, what is stateful RA? Do you mean stateful DHCP?


Gene
Gene


_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss



_______________________________________________
Dnsmasq-discuss mailing list
Dnsmasq-discuss@lists.thekelleys.org.uk
http://lists.thekelleys.org.uk/mailman/listinfo/dnsmasq-discuss

Reply via email to