On 05/26/2015 01:12 PM, Salvatore Orlando wrote:
 From the bug Kevin reported it seems multiple dhcp agents per network have been
completely broken by the fix for bug #1345947, so a revert of patch [1] (and
stable backports) should probably be the first thing to do - if nothing else
because the original bug has not nearly the same level of severity of the one it
introduced.
Before doing this however, I am wondering why the various instances of dnsmasq
end up returning NAKs. I expect all instances to have the same hosts file, so
they should be able to respond to DHCPDISCOVER/DHCPREQUEST correctly. Is the
dnsmasq log telling us exactly why the authoritative setting is preventing us
from doing so? (this is more of a curiosity in my side)

[1] https://review.openstack.org/#/c/152080/

In the original case, the DHCPREQUEST is for a renew, which is different than for an initial request. If the server does not have a lease entry (which it won't after a restart), then it will NAK, which normally just causes the client to retry at INIT state.

I had asked on the dnsmasq list about this [1], and the multiple server question was the wildcard, my testing didn't see the error described in the new bug though. I guess the first proposed fix of re-populating the lease information doesn't seem like such a bad idea any more, but I will reply to my original query with the tcpdump information since I'm confused as to why the second dhcp agent stepped-in with a NAK at all after originally offering the same address as the first dhcp agent [2].

I would agree the best thing to do is revert the stable backports while we work on fixing this in the master branch.

-Brian

[1] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2015q1/009171.html
[2] https://launchpadlibrarian.net/207180476/dhcp_neutron_bug.html


On 26 May 2015 at 06:57, Ihar Hrachyshka <ihrac...@redhat.com
<mailto:ihrac...@redhat.com>> wrote:

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA256

    On 05/26/2015 04:35 AM, Kevin Benton wrote:
    > Hi,
    >
    > A recent change[1] to pass '--dhcp-authoritative' to dnsmasq has
    > caused DHCPNAK messages when multiple agents are scheduled to a
    > network [2].
    >
    > This was back-ported to Icehouse and Juno so we need a fix that is
    > compatible with both of them.
    >
    > I have two fixes for this so far and a third alternative if we
    > don't like those.
    >
    > The first is hacky, but it's only a few-line change.[3] It adds an
    > iptables rule that just stops the DHCPNAKs from making it to the
    > client. This is clean to back-port but it doesn't protect clients
    > that have filtering disabled (e.g. bare metal).
    >
    > The second persists the DHCP leases to a database.[4] The downside
    > to this was always that being rescheduled to another agent would
    > mean no entries in the lease file. This approach adds a work-around
    > to generate an initial fake lease file based on all of the ports in
    > the network.
    >
    > A third approach that I don't have a patch pushed for yet is very
    > similar to the second. When dnsmasq is in the leasefile-ro mode, it
    > will call the script passed to --dhcp-script to get a list of
    > leases to start with. This script would be built with the same
    > logic as the second one. The only difference between the second
    > approach is that dnsmasq wouldn't persist leases to a database.
    >

    Actually, that approach was initially taken for bug 1345947, but then
    the patch was abandoned to be replaced with a simpler
    - --dhcp-authoritative approach that ended up with unexpected NAKs for
    multi agent setup.

    See: https://review.openstack.org/#/c/108272/12

    Maybe we actually want to restore the work and merge it after
    conflicts are resolved and --dhcp-authoritative option is killed; the
    patch was almost merged when --dhcp-authoritative suggestion emerged,
    so most of nitpicking work should be complete now (though at the same
    time, I totally trust our community to find another pile of nits to
    work on for the next few weeks!)


That was my thought as well.
However, we should check whether that patch is ok to backport. For instance I
see what it appears to be adding a script:

[2]
https://review.openstack.org/#/c/108272/12/bin/neutron-dhcp-agent-dnsmasq-lease-init


    ===

    Speaking of regression testing... Are full stack tests already
    powerful enough for us to invoke multiple DHCP agents and test the
    scenario?

    Ihar
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2

    iQEcBAEBCAAGBQJVZHvHAAoJEC5aWaUY1u57vukIAJLPpQ9O236NYtOaRTzkL7g8
    Io1DmF6jyhJYFqfzoFcrFVbNmM0EsNtvMgZIhI8oYINkkoBYMJPoS2a8FvVUpZHw
    u/fmdvdbZgJwy4BCAEF0t+R1t1XLo6eTcPp8f3jABzExWyrLoKEbHJ0aWb5xwJ3u
    V74HXxo/PVifrNfxsQPn57ZxqgBvl4GSQAFQKE4FX/H81HWRWRuB5a9aC+hkYC9w
    7FqXpf+IFCaS7tYdTSqJUa2/bKs268RQGoVqAYEtmVV5pA3OiMsy459rdLcHqqxS
    67lryFh1DTMwI77LjDEanXzWIdMhb3t0YZw7ewpBBLl6P/Lh7xobIOGX2GeOyJ0=
    =xivW
    -----END PGP SIGNATURE-----

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to