[email protected] wrote:
Nope, seems dhcpcd is not requesting a renew of the lease at 1/2 time as
it should (IIRC). http://wiki.neuralbs.com/~kyron/DHCP_reqs_tb17 contains
a grep of the DHCP activity from the disappearing nodes int
/var/log/messages.

Can you get a client-side log, or even better packet capture?
DISCOVER and RENEW requests may look different enough to match
firewall rules differently.  For example, DISCOVER sent to broadcast
address and RENEW sent unicast.
Oct 7 11:50:52 thinkbig16 dhcpcd[29651]: eth0: sending signal 14 to pid 21865 Oct 7 11:50:52 thinkbig16 dhcpcd[21865]: eth0: received SIGALRM, renewing lease
Oct  7 11:50:52 thinkbig16 dhcpcd[21865]: eth0: renewing lease of 10.0.1.136
Oct 7 11:50:52 thinkbig16 dhcpcd[21865]: eth0: NAK: lease not found from 10.0.1.129 Oct 7 11:50:52 thinkbig16 dhcpcd[29652]: eth0: /lib/dhcpcd/dhcpcd-run-hooks: Network is unreachable
Oct  7 11:50:53 thinkbig16 dhcpcd[21865]: eth0: broadcasting for a lease
Oct 7 11:50:53 thinkbig16 dhcpcd[21865]: eth0: offered 10.0.1.136 from 10.0.1.136 `boothost' Oct 7 11:50:53 thinkbig16 dhcpcd[21865]: eth0: ignoring offer of 10.0.1.136 from 10.0.1.136 `boothost' Oct 7 11:50:53 thinkbig16 dhcpcd[21865]: eth0: acknowledged 10.0.1.136 from 10.0.1.136 `boothost' Oct 7 11:50:53 thinkbig16 dhcpcd[21865]: eth0: checking 10.0.1.136 is available on attached networks Oct 7 11:50:59 thinkbig16 dhcpcd[21865]: eth0: leased 10.0.1.136 for 43200 seconds

So the lease time is explicitly received as being 12h (which is correct) but the " eth0: offered 10.0.1.136 from 10.0.1.136 `boothost' " has got me frowning... (note that `boothost' is defined in the config pasted below)

The tcpdump file is available at: http://wiki.neuralbs.com/~kyron/dhcpout_tb16 sorry for the net chatter but you get it all there ;)
What does your network topology look like? Are you using DHCP relays
and/or a not-fully-routed IP network? It's possiblr for hosts to be able
to get a network, but not renew it under some circumstances.
Ok at the risk of being flamed about network topology, I have a hacked up setup but it shouldn't impact DHCP IMHO (I'm emulating NIC bonding through IP masking).

Here is the setup:

http://wiki.neuralbs.com/~kyron/HyperTransport/ClusterNetDiagram.png

Here is the config file (sourced by dnsmasq.conf)
eric@headless ~/1_Files/1_ETS/1_Maitrise/Code/pvq $ cat /etc/dnsmasq.AthlonXP_All.conf
# On force le broadcast à 10.0.1.255
dhcp-option=28,10.0.1.255

#The group name,address range and lease time:
dhcp-range=AthlonXP_1,10.0.1.10,10.0.1.126,255.255.255.0,12h
# The GROUP_NAME's,option 3:default Gateway (if you really need this, nodes shouldn't require routed access):
dhcp-option=AthlonXP,3,10.0.1.1
# The GROUP_NAME's, option 42:time server address:
dhcp-option=AthlonXP_1,42,10.0.1.1
# This is required for PXE booting
dhcp-boot=net:AthlonXP_1,/pxelinux.0,boothost,10.0.1.1
# As can be seen in the dnsmasq.conf, this option is not guaranteed to work (DN search order)
dhcp-option=AthlonXP_1,119,cluster.local
# Domain DNS name
dhcp-option=15,cluster.local
# NIS domain
dhcp-option=40,cluster.local

# Now for the host listing, format is:
# dhcp-host=MACADDRESS,net:GROUP_NAME,NODE_NAME,IP_ADDRESS
dhcp-host=00:01:03:df:ca:44,net:AthlonXP_1,thinkbig1,10.0.1.11
dhcp-host=00:01:03:DF:C6:38,net:AthlonXP_1,thinkbig2,10.0.1.12
dhcp-host=00:01:03:DF:D3:30,net:AthlonXP_1,thinkbig3,10.0.1.13
dhcp-host=00:01:03:DF:D3:08,net:AthlonXP_1,thinkbig4,10.0.1.14
dhcp-host=00:01:03:DF:D3:01,net:AthlonXP_1,thinkbig5,10.0.1.15
dhcp-host=00:01:03:DF:CA:3B,net:AthlonXP_1,thinkbig6,10.0.1.16
dhcp-host=00:04:75:C2:21:14,net:AthlonXP_1,thinkbig7,10.0.1.17
dhcp-host=00:01:03:DF:CA:46,net:AthlonXP_1,thinkbig8,10.0.1.18
dhcp-host=00:01:03:de:5f:2e,net:AthlonXP_1,thinkbig9,10.0.1.19

#
# The group name,address range and lease time:
dhcp-range=AthlonXP_2,10.0.1.130,10.0.1.254,255.255.255.0,12h
# The GROUP_NAME's,option 3:default Gateway (if you really need this, nodes shouldn't require routed access):
dhcp-option=AthlonXP_2,3,10.0.1.129
# The GROUP_NAME's, option 42:time server address:
dhcp-option=AthlonXP_2,42,10.0.1.129
# This is required for PXE booting
dhcp-boot=net:AthlonXP_2,/pxelinux.0,boothost,10.0.1.129
# As can be seen in the dnsmasq.conf, this option is not guaranteed to work (DN search order)
#dhcp-option=AthlonXP_2,119,cluster.local
# Now for the host listing, format is:
# dhcp-host=MACADDRESS,net:GROUP_NAME,NODE_NAME,IP_ADDRESS

dhcp-host=00:01:03:DE:B5:C3,net:AthlonXP_1,thinkbig10,10.0.1.130
dhcp-host=00:01:03:DE:B6:AE,net:AthlonXP_1,thinkbig11,10.0.1.131
dhcp-host=00:04:75:AA:36:5B,net:AthlonXP_1,thinkbig12,10.0.1.132
dhcp-host=00:01:03:DF:CA:42,net:AthlonXP_2,thinkbig13,10.0.1.133
dhcp-host=00:01:03:DE:B5:C2,net:AthlonXP_2,thinkbig14,10.0.1.134
dhcp-host=00:01:03:24:E9:3B,net:AthlonXP_2,thinkbig15,10.0.1.135
dhcp-host=00:04:75:EC:33:47,net:AthlonXP_2,thinkbig16,10.0.1.136
dhcp-host=00:04:75:EC:4E:F2,net:AthlonXP_2,thinkbig17,10.0.1.137
#dhcp-host=00:04:75:EC:4E:CF,net:AthlonXP_2,thinkbig18,10.0.1.138

Reply via email to