Re: [lwip-users] broken DHCP/ARP interaction

2014-05-30 Thread Radouch, Zdenek
At long last, I understand the problem. The lesson I learned is that you can't 
trust the packet ether addresses -- they were being faked by the Cisco hardware 
(sending a packet with someone else's source address). The problem itself is a 
case of a poorly designed ARP proxy/optimization within the Cisco enterprise AP 
software.

The problematic sequence is as follows:
[here the AP/DHCP is a Cisco enterprise AP network, the device is a WiFi module 
running lwIP]

0. A device associates with a WiFi network (SSID) 
1. The device requests IP address from a DHCP server 
2. The DHCP server sends an IP address to the device 
3. The device attempts to validate the IP address by sending out ARP who has 
for the IP address offered by the DHCP server.
This is the standard gratuitous ARP request to ensure that there is no device 
with the same IP address; no reply is expected.
4. The Cisco AP to which the WiFi device is connected sends (wrongly) an ARP 
response (for the offered IP address) to the WiFi device!
Note that at this point in time, the IP address has not yet been assigned to 
any interface, so it is an error for the AP to proxy ARP. Furthermore, it is 
wrong to send a proxy reply to the very device on behalf of which the proxy is 
sent.
5. The device sees an [unexpected] ARP reply (with what should be its IP 
address), and consequently rejects the IP address.
Note that the lwIP DHCP code does not notice that the ARP response contains its 
own MAC address in all fields; it simply acts on it.

It is not clear to me exactly how the Cisco AP ARP proxy module learns about 
the IP address, but it is clear that it uses a wrong algorithm. As is, the 
Cisco algorithm produces an ARP [proxy] reply before and without knowing that 
the address has actually been assigned. That is, it is not caching responses, 
it is synthesizing responses before any real device did (or could) send them. 
Instead, it should simply cache ARP responses; this would have guaranteed that 
at least on the assignment the proxy replies were valid.

Regarding the lwIP code, it could be more defensive. The bad ARP packet that 
causes the DHCP rejection has all four MAC addresses (ether src, ether dest, 
sender hardware, target hardware) the same, equal to the MAC address of the 
device itself. 
I would think such packet should be discarded.

To conclude, no, the DHCP/ARP lwIP code is not broken per se, but it will not 
work when deployed against the Cisco Aironet Enterprise AP solution.

-Z



 -Original Message-
 From: lwip-users-bounces+zradouch=irobot@nongnu.org [mailto:lwip-
 users-bounces+zradouch=irobot@nongnu.org] On Behalf Of Simon
 Goldschmidt
 Sent: Thursday, May 15, 2014 2:04 AM
 To: Mailing list for lwIP users
 Subject: Re: [lwip-users] broken DHCP/ARP interaction
 
 Radouch, Zdenek wrote:
  However, in this case the ARP does something that is outright wrong -
 -
  it answers an ARP query for an IP address that has not been assigned
  yet -- the DHCP client is still trying to ensure that it is OK to use
  the address (hence the ARP query). So the behavior is completely
 broken with respect to temporal dependencies.
 
 I can't reproduce this, but I do think it would be a bug. However,
 Sergio is right in that you should take cautions to not receive (loop-
 back) self-sent packets. While it might not break things like you see
 (it does not, for me), it can lead to other anomalties like receiving
 your own broadcast on broadcast sockets or seeing a reply to a
 Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 address selection
 won't work).
 
 Simon
 
 ___
 lwip-users mailing list
 lwip-users@nongnu.org
 https://lists.nongnu.org/mailman/listinfo/lwip-users



___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


Re: [lwip-users] broken DHCP/ARP interaction

2014-05-15 Thread Simon Goldschmidt
Radouch, Zdenek wrote:
 However, in this case the ARP does something that is outright wrong -- it 
 answers an ARP query
 for an IP address that has not been assigned yet -- the DHCP client is still 
 trying to
 ensure that it is OK to use the address (hence the ARP query). So the 
 behavior is completely broken
 with respect to temporal dependencies.

I can't reproduce this, but I do think it would be a bug. However, Sergio is 
right in that you should take cautions to not receive (loop-back) self-sent 
packets. While it might not break things like you see (it does not, for me), it 
can lead to other anomalties like receiving your own broadcast on broadcast 
sockets or seeing a reply to a Gratuitizs ARP you send (so IPv4 AutoIP and IPv6 
address selection won't work).

Simon

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


Re: [lwip-users] broken DHCP/ARP interaction

2014-05-14 Thread Radouch, Zdenek
Thanks for the pointer; I'll check it out.

However, in this case the ARP does something that is outright wrong -- it 
answers an ARP query
for an IP address that has not been assigned yet -- the DHCP client is still 
trying to
ensure that it is OK to use the address (hence the ARP query). So the behavior 
is completely broken
with respect to temporal dependencies.

-Z



From: lwip-users-bounces+zradouch=irobot@nongnu.org 
[lwip-users-bounces+zradouch=irobot@nongnu.org] on behalf of Zach Smith 
[zsm...@campbellsci.com]
Sent: Wednesday, May 14, 2014 12:17 PM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] broken DHCP/ARP interaction

I also have had a problem similar to this. I have lwip running in a wifi device 
and I was having problems because my router was echoing packets back to me. I 
can’t remember if this interfered with ARP working correctly or if it was 
causing some other problem. But what I did is put a function in the low level 
ethernet_input() that filters out any Ethernet packets that have a src MAC 
address equal to my MAC address. I would also be curious to hear if other folks 
have had to do this?

I also had a problem similar to this with IPv6 not working when I was plugged 
into an old hub rather than a switch. IPv6 auto-configuration and duplicate 
address detection wasn’t working right because I was getting my own transmitted 
packets echoed back to me. There is a thread about this on the lwip mailing 
list archives:
http://lists.nongnu.org/archive/html/lwip-users/2012-12/msg00020.html

It seems like you could be experiencing something similar maybe.

From: lwip-users-bounces+zsmith=campbellsci@nongnu.org 
[mailto:lwip-users-bounces+zsmith=campbellsci@nongnu.org] On Behalf Of 
Radouch, Zdenek
Sent: Tuesday, May 13, 2014 1:20 PM
To: lwip-users@nongnu.org
Subject: [lwip-users] broken DHCP/ARP interaction

I am having an LwIP problem where the local ARP erroneously answers its own 
who-has request
and by doing that prevents DHCP from working correctly. I would appreciate if 
anyone
familiar with the ARP code could comment on how this could possibly happen (and 
of
course help me fix it ☺).

The setup is an embedded WiFi device running FreeRTOS with LwIP, attempting to 
obtain
an IP address from the wireless network it just joined. I have a wireshark log 
I could provide
(it’s big and hard to read) capturing the following:


1.   The device associates with the network and requests IP address from 
the DHCP server

2.   The DHCP server gives IP address to the device

3.   The device runs an ARP IP check on the address

4.   The device erroneously answers its own ARP request

5.   The device consequently (ARP answered) rejects the IP address offered 
by the DHCP server

Obviously, the #4 is wrong, the device should have never answered its own ARP 
since the IP address
in question is still being validated (by that very ARP transaction).
I am not familiar with the code base so I have no idea how this could happen 
(appears obviously wrong).
I turned on some debugging in the stack and what I see simply confirms what I 
had decoded with wireshark.
This is easily reproducible (as in it happens every single time I run it) so I 
could instrument more of the stack,
if that would help with debugging.

Here is the log:

Aspen firmware: May 13 2014 13:58:05
[2.806917] : etharp_send_ip: sending packet 0x20004b34
[2.811492] : etharp_send_ip: sending packet 0x20004b34
[2.905013] : etharp_send_ip: sending packet 0x20004b34
[3.109013] : etharp_send_ip: sending packet 0x20004b34
[appln] listen: OK
[wlcm] got wifi message: 8 0 0x
[wlcm] got event: scan result
[wlcm] Found better AP iRobot-Guest on channel 11
[wlcm] starting association to default
[wlcm] got wifi message: 10 0 0x
[wlcm] got event: association result: success
[wlcm] got wifi message: 12 0 0x
[wlcm] got event: authentication result: success
[net] configuring interface mlan (with DHCP client)
[4.824439] : dhcp_start(netif=0x20012024) ml1
[4.828588] : dhcp_start(): starting new DHCP client
[4.836495] : dhcp_start(): starting DHCP configuration
[4.841152] : dhcp_discover()
[4.843650] : transaction id xid(5542A27)
[4.847158] : dhcp_discover: making request
[4.850809] : dhcp_discover: realloc()ing
[4.854298] : dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, 
DHCP_SERVER_PORT)
[4.861214] : etharp_send_ip: sending packet 0x20004b7c
[4.865908] : dhcp_discover: deleting()ing
[4.869444] : dhcp_discover: SELECTING
[4.872686] : dhcp_discover(): set request timeout 2000 msecs
[4.877847] : ethernet_input: dest:ff:ff:ff:ff:ff:ff, src:44:ad:d9:02:a8:cd, 
type:800
[4.885014] : dhcp_recv(pbuf = 0x2000f7f8) from DHCP server 172.16.16.7 port 67
[4.891654] : pbuf-len = 310
[4.894150] : pbuf-tot_len = 310
[4.896998] : searching DHCP_OPTION_MESSAGE_TYPE
[4.901056] : DHCP_OFFER received in DHCP_SELECTING state
[4.905881] : dhcp_handle_offer(netif

Re: [lwip-users] broken DHCP/ARP interaction

2014-05-14 Thread Sergio R. Caprile
Hi,
Quickdirty workaround: disable ARP checking by defining
DHCP_DOES_ARP_CHECK to 0

Analysis:
I'm no expert in DHCP nor ARP, but I don't see anything similar to what
you are experiencing.
I've setup three scenarios:
1- different IP (static) prior to DHCP
2- same IP (static) prior to DHCP
3- 0.0.0.0
In scenarios 1 and 3, everything works as expected
In scenario 2, I see a gratuitous ARP, but the address is accepted
anyway. Here is my capture:

No. Time   SourceDestination  
Protocol Length Info
  1 0.0192.168.1.42  255.255.255.255  
DHCP 350DHCP Discover - Transaction ID 0xabcd0001

  2 0.001705000192.168.1.1   192.168.1.42 
DHCP 342DHCP Offer- Transaction ID 0xabcd0001

  3 0.002564000192.168.1.42  255.255.255.255  
DHCP 350DHCP Request  - Transaction ID 0xabcd0002

  4 0.04042192.168.1.1   192.168.1.42 
DHCP 342DHCP ACK  - Transaction ID 0xabcd0002

  5 0.0408410003com_03:04:05 Broadcast
ARP  60 Gratuitous ARP for 192.168.1.42 (Request)

Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on
interface 0
Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast
(ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
Hardware type: Ethernet (1)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
[Is gratuitous: True]
Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05)
Sender IP address: 192.168.1.42 (192.168.1.42)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.1.42 (192.168.1.42)

  6 0.499763com_03:04:05 Broadcast
ARP  60 Gratuitous ARP for 192.168.1.42 (Request)


I suggest you check your options and step the code to see where this
rejection takes place

-- 

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


Re: [lwip-users] broken DHCP/ARP interaction

2014-05-14 Thread Radouch, Zdenek
Well, as far as seeing something else, I, too, can see something else when I 
configure my network
differently with respect to where the DHCP server is, and I, too, can see 
things working.
But in my current configuration (which happens to be our corporate network with 
hundreds of different
machines connecting correctly all the time) the lwIP stack does not work.
A workaround that ignores the failure, or one that avoids a failure by not 
checking is not exactly a solution
(I know, you said dirty), so I need to find and fix the problem -- we can't 
afford deploying an embedded
product with something so blatantly broken.

The sequence I see looks like this:

No.,Time,Source,Destination,Protocol,Length,Info
390,0.814720,0.0.0.0,255.255.255.255,DHCP,395,DHCP Discover - 
Transaction ID 0x5542a27
409,0.854480,172.16.16.7,255.255.255.255,DHCP,392,DHCP Offer- 
Transaction ID 0x5542a27
424,0.896977,0.0.0.0,255.255.255.255,DHCP,395,DHCP Request  - 
Transaction ID 0x169c39e2
433,0.907728,172.16.16.7,255.255.255.255,DHCP,392,DHCP ACK  - 
Transaction ID 0x169c39e2
443,0.930603,6c:ad:f8:e5:27:fe,Broadcast,ARP,87,Who has 
172.16.20.182?  Tell 0.0.0.0
447,0.932726,6c:ad:f8:e5:27:fe,6c:ad:f8:e5:27:fe,ARP,102,172.16.20.182
 is at 6c:ad:f8:e5:27:fe
459,0.956479,0.0.0.0,255.255.255.255,DHCP,395,DHCP Decline  - 
Transaction ID 0x2eab4956

All is well until packet #447 -- the address has not been validated yet, the 
DHCP machine is still in
the DHCP_CHECKING state, so there should not be an ARP entry for the still 
invalid IP address (172.16.20.182).
I have stepped through enough code to understand the rejection: it happens in 
dhcp.c(960)
in dhcp_arp_reply(). Unfortunately I don't understand the ARP code, and I don't 
understand how it
interacts with the DHCP client, hence my looking for someone who knows the lwIP 
ARP design.

That is, I am looking for someone who could tell me what the lwIP ARP is 
expected to do in this exact
situation (DHCP ack while IP=0) and how exactly is the new IP address handed 
from the DHCP client
to the rest of the stack (including the ARP module).

-Z


From: lwip-users-bounces+zradouch=irobot@nongnu.org 
[lwip-users-bounces+zradouch=irobot@nongnu.org] on behalf of Sergio R. 
Caprile [scapr...@gmail.com]
Sent: Wednesday, May 14, 2014 2:10 PM
To: lwip-users@nongnu.org
Subject: Re: [lwip-users] broken DHCP/ARP interaction

Hi,
Quickdirty workaround: disable ARP checking by defining
DHCP_DOES_ARP_CHECK to 0

Analysis:
I'm no expert in DHCP nor ARP, but I don't see anything similar to what
you are experiencing.
I've setup three scenarios:
1- different IP (static) prior to DHCP
2- same IP (static) prior to DHCP
3- 0.0.0.0
In scenarios 1 and 3, everything works as expected
In scenario 2, I see a gratuitous ARP, but the address is accepted
anyway. Here is my capture:

No. Time   SourceDestination
Protocol Length Info
  1 0.0192.168.1.42  255.255.255.255
DHCP 350DHCP Discover - Transaction ID 0xabcd0001

  2 0.001705000192.168.1.1   192.168.1.42
DHCP 342DHCP Offer- Transaction ID 0xabcd0001

  3 0.002564000192.168.1.42  255.255.255.255
DHCP 350DHCP Request  - Transaction ID 0xabcd0002

  4 0.04042192.168.1.1   192.168.1.42
DHCP 342DHCP ACK  - Transaction ID 0xabcd0002

  5 0.0408410003com_03:04:05 Broadcast
ARP  60 Gratuitous ARP for 192.168.1.42 (Request)

Frame 5: 60 bytes on wire (480 bits), 60 bytes captured (480 bits) on
interface 0
Ethernet II, Src: 3com_03:04:05 (00:01:02:03:04:05), Dst: Broadcast
(ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request/gratuitous ARP)
Hardware type: Ethernet (1)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
[Is gratuitous: True]
Sender MAC address: 3com_03:04:05 (00:01:02:03:04:05)
Sender IP address: 192.168.1.42 (192.168.1.42)
Target MAC address: 00:00:00_00:00:00 (00:00:00:00:00:00)
Target IP address: 192.168.1.42 (192.168.1.42)

  6 0.499763com_03:04:05 Broadcast
ARP  60 Gratuitous ARP for 192.168.1.42 (Request)


I suggest you check your options and step the code to see where this
rejection takes place

--

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users



___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users


Re: [lwip-users] broken DHCP/ARP interaction

2014-05-14 Thread Sergio R. Caprile
I suggested the QD workaround as a way to check if that was really the
problem.
I guess your problem is that you are seeing your own messages, something
that may not be the case in standard Ethernets (you mentioned Wi-Fi,
didn't you?)
I suggest to follow the code execution and help the developers, I'm just
trying to help so I won't have these problems myself in the future, and
I'm willing to peek in the sources to give my opinion or propose a
patch. Unfortunately I can't duplicate your scenario so I can't dive deeper.
As far as I can follow the analysis, you are right, and the DHCP code
seems to rush to assign an IP when it should wait after the ARP
validation. Try to find that with a breakpoint and tell us what you see.
Try also following the maling list thread and keep us posted, someone
else might have submitted a patch or a (more elegant) workaround

-- 


___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users