Hello,

I work on an extension of the mobility protocol for IPv6 (FMIPv6-RFC4068) and I've noticed the following problem:

When I switch my Mobile Node between 2 Wi-Fi Access Points, there is a period of time where all the packets I send are lost, although I got the netlink event SIOCGIWAP 'up' for the new AP. The device is supposed to be ready, but the packets are lost.

By 'lost', I mean: silently discarded by the kernel, no error returned to the user-space application sending the packet, packets never appear on the network monitored with wireshark.

Here is the setup:
------------------

1. The daemon decides to switch from one AP to the other for some reason (better link quality, ...) and set the new ESSID, etc.

2. The daemon waits for the SIOCGIWAP 'up' netlink event.

3. SIOCGIWAP received: the daemon sends a unique Mobility Header packet using a raw socket to its new router to signal it has successfully moved. sendmsg() returned 0, no error, but the packet never shows up.

- The interface has an IPv6 address configured for the new network (previously created).
- There is a route between the node and the router.
- I set the socket option IPV6_RECVERR to get all the errors, but none shows up. - The "black hole" period lasts for about 500ms after the SIOCGIWAP event. Every packets sent during this period are lost. - I tried to get the interface status before sending the packet (ioctl(SIOCSIFFLAGS)) but I got a perfect IFF_UP|IFF_RUNNING.

What I've found in the kernel:
------------------------------

- The packet is discarded in the packet scheduler in net/sched/sch_generic.c::noop_enqueue() which returns NET_XMIT_CN.

- The error doesn't go up to the application because net/ipv6/ip6_output.c::ip6_push_pending_frames() filters this type of error (using net_xmit_errno(err)).

- noop_enqueue() is used to enqueue the packet because the device has been deactivated by link_watch_run_queue() calling dev_deactivate(). The device is re-activated about 500ms later.

- According to net/sched/sch_api.c, NET_XMIT_CN means "probably this packet enqueued, but another one dropped". But it seems to me that this packet IS actually dropped in noop_enqueue() (kfree_skb()). Shouldn't the routine return NET_XMIT_DROP instead? Then the application should be able to get the error code when the device is deactivated and retry later?


My questions:
-------------

- Am I doing something obviously wrong? Is there another event I should expect before sending my packet? An event that signals more reliably that the link is up and running and associated with the new AP?

- Shouldn't we change the return code in noop_enqueue()?


Thanks a lot for your help,
Benjamin

--
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

   http://www.bull.com
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to