Re: [lwip-users] broken DHCP/ARP interaction
At long last, I understand the problem. The lesson I learned is that you can't trust the packet ether addresses -- they were being faked by the Cisco hardware (sending a packet with someone else's source address). The problem itself is a case of a poorly designed ARP proxy/optimization within the Cisco enterprise AP software. The problematic sequence is as follows: [here the AP/DHCP is a Cisco enterprise AP network, the device is a WiFi module running lwIP] 0. A device associates with a WiFi network (SSID) 1. The device requests IP address from a DHCP server 2. The DHCP server sends an IP address to the device 3. The device attempts to validate the IP address by sending out ARP who has for the IP address offered by the DHCP server. This is the standard gratuitous ARP request to ensure that there is no device with the same IP address; no reply is expected. 4. The Cisco AP to which the WiFi device is connected sends (wrongly) an ARP response (for the offered IP address) to the WiFi device! Note that at this point in time, the IP address has not yet been assigned to any interface, so it is an error for the AP to proxy ARP. Furthermore, it is wrong to send a proxy reply to the very device on behalf of which the proxy is sent. 5. The device sees an [unexpected] ARP reply (with what should be its IP address), and consequently rejects the IP address. Note that the lwIP DHCP code does not notice that the ARP response contains its own MAC address in all fields; it simply acts on it. It is not clear to me exactly how the Cisco AP ARP proxy module learns about the IP address, but it is clear that it uses a wrong algorithm. As is, the Cisco algorithm produces an ARP [proxy] reply before and without knowing that the address has actually been assigned. That is, it is not caching responses, it is synthesizing responses before any real device did (or could) send them. Instead, it should simply cache ARP responses; this would have guaranteed that at least on the assignment the proxy replies were valid. Regarding the lwIP code, it could be more defensive. The bad ARP packet that causes the DHCP rejection has all four MAC addresses (ether src, ether dest, sender hardware, target hardware) the same, equal to the MAC address of the device itself. I would think such packet should be discarded. To conclude, no, the DHCP/ARP lwIP code is not broken per se, but it will not work when deployed against the Cisco Aironet Enterprise AP solution. -Z -Original Message- From: lwip-users-bounces+zradouch=irobot@nongnu.org [mailto:lwip- users-bounces+zradouch=irobot@nongnu.org] On Behalf Of Simon Goldschmidt Sent: Thursday, May 15, 2014 2:04 AM To: Mailing list for lwIP users Subject: Re: [lwip-users] broken DHCP/ARP interaction Radouch, Zdenek wrote: However, in this case the ARP does something that is outright wrong - - it answers an ARP query for an IP address that has not been assigned yet -- the DHCP client is still trying to ensure that it is OK to use the address (hence the ARP query). So the behavior is completely broken with respect to temporal dependencies. I can't reproduce this, but I do think it would be a bug. However, Sergio is right in that you should take cautions to not receive (loop- back) self-sent packets. While it might not break things like you see (it does not, for me), it can lead to other anomalties like receiving your own broadcast on broadcast sockets or seeing a reply to a Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 address selection won't work). Simon ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] broken DHCP/ARP interaction
Radouch, Zdenek wrote: However, in this case the ARP does something that is outright wrong -- it answers an ARP query for an IP address that has not been assigned yet -- the DHCP client is still trying to ensure that it is OK to use the address (hence the ARP query). So the behavior is completely broken with respect to temporal dependencies. I can't reproduce this, but I do think it would be a bug. However, Sergio is right in that you should take cautions to not receive (loop-back) self-sent packets. While it might not break things like you see (it does not, for me), it can lead to other anomalties like receiving your own broadcast on broadcast sockets or seeing a reply to a Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 address selection won't work). Simon ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] broken DHCP/ARP interaction
Thanks for the pointer; I'll check it out. However, in this case the ARP does something that is outright wrong -- it answers an ARP query for an IP address that has not been assigned yet -- the DHCP client is still trying to ensure that it is OK to use the address (hence the ARP query). So the behavior is completely broken with respect to temporal dependencies. -Z From: lwip-users-bounces+zradouch=irobot@nongnu.org [lwip-users-bounces+zradouch=irobot@nongnu.org] on behalf of Zach Smith [zsm...@campbellsci.com] Sent: Wednesday, May 14, 2014 12:17 PM To: Mailing list for lwIP users Subject: Re: [lwip-users] broken DHCP/ARP interaction I also have had a problem similar to this. I have lwip running in a wifi device and I was having problems because my router was echoing packets back to me. I can’t remember if this interfered with ARP working correctly or if it was causing some other problem. But what I did is put a function in the low level ethernet_input() that filters out any Ethernet packets that have a src MAC address equal to my MAC address. I would also be curious to hear if other folks have had to do this? I also had a problem similar to this with IPv6 not working when I was plugged into an old hub rather than a switch. IPv6 auto-configuration and duplicate address detection wasn’t working right because I was getting my own transmitted packets echoed back to me. There is a thread about this on the lwip mailing list archives: http://lists.nongnu.org/archive/html/lwip-users/2012-12/msg00020.html It seems like you could be experiencing something similar maybe. From: lwip-users-bounces+zsmith=campbellsci@nongnu.org [mailto:lwip-users-bounces+zsmith=campbellsci@nongnu.org] On Behalf Of Radouch, Zdenek Sent: Tuesday, May 13, 2014 1:20 PM To: lwip-users@nongnu.org Subject: [lwip-users] broken DHCP/ARP interaction I am having an LwIP problem where the local ARP erroneously answers its own who-has request and by doing that prevents DHCP from working correctly. I would appreciate if anyone familiar with the ARP code could comment on how this could possibly happen (and of course help me fix it ☺). The setup is an embedded WiFi device running FreeRTOS with LwIP, attempting to obtain an IP address from the wireless network it just joined. I have a wireshark log I could provide (it’s big and hard to read) capturing the following: 1. The device associates with the network and requests IP address from the DHCP server 2. The DHCP server gives IP address to the device 3. The device runs an ARP IP check on the address 4. The device erroneously answers its own ARP request 5. The device consequently (ARP answered) rejects the IP address offered by the DHCP server Obviously, the #4 is wrong, the device should have never answered its own ARP since the IP address in question is still being validated (by that very ARP transaction). I am not familiar with the code base so I have no idea how this could happen (appears obviously wrong). I turned on some debugging in the stack and what I see simply confirms what I had decoded with wireshark. This is easily reproducible (as in it happens every single time I run it) so I could instrument more of the stack, if that would help with debugging. Here is the log: Aspen firmware: May 13 2014 13:58:05 [2.806917] : etharp_send_ip: sending packet 0x20004b34 [2.811492] : etharp_send_ip: sending packet 0x20004b34 [2.905013] : etharp_send_ip: sending packet 0x20004b34 [3.109013] : etharp_send_ip: sending packet 0x20004b34 [appln] listen: OK [wlcm] got wifi message: 8 0 0x [wlcm] got event: scan result [wlcm] Found better AP iRobot-Guest on channel 11 [wlcm] starting association to default [wlcm] got wifi message: 10 0 0x [wlcm] got event: association result: success [wlcm] got wifi message: 12 0 0x [wlcm] got event: authentication result: success [net] configuring interface mlan (with DHCP client) [4.824439] : dhcp_start(netif=0x20012024) ml1 [4.828588] : dhcp_start(): starting new DHCP client [4.836495] : dhcp_start(): starting DHCP configuration [4.841152] : dhcp_discover() [4.843650] : transaction id xid(5542A27) [4.847158] : dhcp_discover: making request [4.850809] : dhcp_discover: realloc()ing [4.854298] : dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, DHCP_SERVER_PORT) [4.861214] : etharp_send_ip: sending packet 0x20004b7c [4.865908] : dhcp_discover: deleting()ing [4.869444] : dhcp_discover: SELECTING [4.872686] : dhcp_discover(): set request timeout 2000 msecs [4.877847] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, type:800 [4.885014] : dhcp_recv(pbuf = 0x2000f7f8) from DHCP server 172.16.16.7 port 67 [4.891654] : pbuf-len = 310 [4.894150] : pbuf-tot_len = 310 [4.896998] : searching DHCP_OPTION_MESSAGE_TYPE [4.901056] : DHCP_OFFER received in DHCP_SELECTING state [4.905881] : dhcp_handle_offer(netif
Re: [lwip-users] broken DHCP/ARP interaction
Hi, Quickdirty workaround: disable ARP checking by defining DHCP_DOES_ARP_CHECK to 0 Analysis: I'm no expert in DHCP nor ARP, but I don't see anything similar to what you are experiencing. I've setup three scenarios: 1- different IP (static) prior to DHCP 2- same IP (static) prior to DHCP 3- 0.0.0.0 In scenarios 1 and 3, everything works as expected In scenario 2, I see a gratuitous ARP, but the address is accepted anyway. Here is my capture: No. Time SourceDestination Protocol Length Info 1 0.0192.168.1.42 255.255.255.255 DHCP 350DHCP Discover - Transaction ID 0xabcd0001 2 0.001705000192.168.1.1 192.168.1.42 DHCP 342DHCP Offer- Transaction ID 0xabcd0001 3 0.002564000192.168.1.42 255.255.255.255 DHCP 350DHCP Request - Transaction ID 0xabcd0002 4 0.04042192.168.1.1 192.168.1.42 DHCP 342DHCP ACK - Transaction ID 0xabcd0002 5 0.0408410003com_03:04:05 Broadcast ARP 60 Gratuitous ARP for 192.168.1.42 (Request) Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface 0 Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Address Resolution Protocol (request/gratuitous ARP) Hardware type: Ethernet (1) Protocol type: IP (0x0800) Hardware size: 6 Protocol size: 4 Opcode: request (1) [Is gratuitous: True] Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05) Sender IP address: 192.168.1.42 (192.168.1.42) Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00) Target IP address: 192.168.1.42 (192.168.1.42) 6 0.499763com_03:04:05 Broadcast ARP 60 Gratuitous ARP for 192.168.1.42 (Request) I suggest you check your options and step the code to see where this rejection takes place -- ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] broken DHCP/ARP interaction
Well, as far as seeing something else, I, too, can see something else when I configure my network differently with respect to where the DHCP server is, and I, too, can see things working. But in my current configuration (which happens to be our corporate network with hundreds of different machines connecting correctly all the time) the lwIP stack does not work. A workaround that ignores the failure, or one that avoids a failure by not checking is not exactly a solution (I know, you said dirty), so I need to find and fix the problem -- we can't afford deploying an embedded product with something so blatantly broken. The sequence I see looks like this: No.,Time,Source,Destination,Protocol,Length,Info 390,0.814720,0.0.0.0,255.255.255.255,DHCP,395,DHCP Discover - Transaction ID 0x5542a27 409,0.854480,172.16.16.7,255.255.255.255,DHCP,392,DHCP Offer- Transaction ID 0x5542a27 424,0.896977,0.0.0.0,255.255.255.255,DHCP,395,DHCP Request - Transaction ID 0x169c39e2 433,0.907728,172.16.16.7,255.255.255.255,DHCP,392,DHCP ACK - Transaction ID 0x169c39e2 443,0.930603,6c:ad:f8:e5:27:fe,Broadcast,ARP,87,Who has 172.16.20.182? Tell 0.0.0.0 447,0.932726,6c:ad:f8:e5:27:fe,6c:ad:f8:e5:27:fe,ARP,102,172.16.20.182 is at 6c:ad:f8:e5:27:fe 459,0.956479,0.0.0.0,255.255.255.255,DHCP,395,DHCP Decline - Transaction ID 0x2eab4956 All is well until packet #447 -- the address has not been validated yet, the DHCP machine is still in the DHCP_CHECKING state, so there should not be an ARP entry for the still invalid IP address (172.16.20.182). I have stepped through enough code to understand the rejection: it happens in dhcp.c(960) in dhcp_arp_reply(). Unfortunately I don't understand the ARP code, and I don't understand how it interacts with the DHCP client, hence my looking for someone who knows the lwIP ARP design. That is, I am looking for someone who could tell me what the lwIP ARP is expected to do in this exact situation (DHCP ack while IP=0) and how exactly is the new IP address handed from the DHCP client to the rest of the stack (including the ARP module). -Z From: lwip-users-bounces+zradouch=irobot@nongnu.org [lwip-users-bounces+zradouch=irobot@nongnu.org] on behalf of Sergio R. Caprile [scapr...@gmail.com] Sent: Wednesday, May 14, 2014 2:10 PM To: lwip-users@nongnu.org Subject: Re: [lwip-users] broken DHCP/ARP interaction Hi, Quickdirty workaround: disable ARP checking by defining DHCP_DOES_ARP_CHECK to 0 Analysis: I'm no expert in DHCP nor ARP, but I don't see anything similar to what you are experiencing. I've setup three scenarios: 1- different IP (static) prior to DHCP 2- same IP (static) prior to DHCP 3- 0.0.0.0 In scenarios 1 and 3, everything works as expected In scenario 2, I see a gratuitous ARP, but the address is accepted anyway. Here is my capture: No. Time SourceDestination Protocol Length Info 1 0.0192.168.1.42 255.255.255.255 DHCP 350DHCP Discover - Transaction ID 0xabcd0001 2 0.001705000192.168.1.1 192.168.1.42 DHCP 342DHCP Offer- Transaction ID 0xabcd0001 3 0.002564000192.168.1.42 255.255.255.255 DHCP 350DHCP Request - Transaction ID 0xabcd0002 4 0.04042192.168.1.1 192.168.1.42 DHCP 342DHCP ACK - Transaction ID 0xabcd0002 5 0.0408410003com_03:04:05 Broadcast ARP 60 Gratuitous ARP for 192.168.1.42 (Request) Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on interface 0 Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast (ff:ff:ff:ff:ff:ff) Address Resolution Protocol (request/gratuitous ARP) Hardware type: Ethernet (1) Protocol type: IP (0x0800) Hardware size: 6 Protocol size: 4 Opcode: request (1) [Is gratuitous: True] Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05) Sender IP address: 192.168.1.42 (192.168.1.42) Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00) Target IP address: 192.168.1.42 (192.168.1.42) 6 0.499763com_03:04:05 Broadcast ARP 60 Gratuitous ARP for 192.168.1.42 (Request) I suggest you check your options and step the code to see where this rejection takes place -- ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] broken DHCP/ARP interaction
I suggested the QD workaround as a way to check if that was really the problem. I guess your problem is that you are seeing your own messages, something that may not be the case in standard Ethernets (you mentioned Wi-Fi, didn't you?) I suggest to follow the code execution and help the developers, I'm just trying to help so I won't have these problems myself in the future, and I'm willing to peek in the sources to give my opinion or propose a patch. Unfortunately I can't duplicate your scenario so I can't dive deeper. As far as I can follow the analysis, you are right, and the DHCP code seems to rush to assign an IP when it should wait after the ARP validation. Try to find that with a breakpoint and tell us what you see. Try also following the maling list thread and keep us posted, someone else might have submitted a patch or a (more elegant) workaround -- ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users