Hi *,
On Mon, Jul 10, 2000 at 06:59:46PM -0700, Guy Harris wrote:
[Linux kernel hiding the original packet size]
> I put in an "fprintf()" to log the value of "packet_len" and "caplen",
> and ran "tcpdump" without any "-s" flag, and with a "-w" flag so that
> the "fprintf()" messages wouldn't get mixed with packet printout
> messages, and it said:
>
> tcpdump: listening on eth0
> packet_len 60, caplen 60
> packet_len 82, caplen 68
> packet_len 86, caplen 68
[...]
> Hugh was running what I presume was a beta version of RH 6.2 (it had a
> 6.1.something version number), with a 2.2.15 kernel; what kernel are you
> using? Perhaps something broke between 2.2.14 and 2.2.15 (I'll check
> those two versions when I go home).
I think it's time to lift the cover of this problem :)
I read a big chunk of kernel code and tried to understand the data flow of
a packet trough the kernel code. Correct me if I am wrong...
Path of a packet through the kernel
-----------------------------------
1. Whenever a packet is sent or received at a network interface the
skbuff structure is cloned (in net/core/dev.c) and presented to
all protocol hooks.
2. As the packet socket code installs such a hook the packet is
delivered to packet_rcv (net/packet/af_packet.c) which uses
sock_queue_rcv_skb to insert it into the receive queue of the
packet socket.
3. sock_queue_rcv_skb checks if there is enough space in the receive
buffer of the socket, feeds the packet trough any kernel filter
connected to that socket, makes the socket the owner of the packet
and add it at the end of the receive queue.
4. At some time recvmsg is called on the socket and gets the packet
from the receive queue
What's going wrong here?
------------------------
The problem which hit libpcap is, that we got exactly as much
bytes as our buffer can hold. MSG_TRUNC is never set on return and
this way there are a number of error messages generated by tcpdump
(truncated ip). I checked the packet_recvmsg function, the libc6 source
and libpcap but was unable to find the problem.
Only after following the packet through the kernel I noticed that
the packet is trimmed when it is enqueued. How can the kernel know the
size of our receive buffer? Simple: It is in fact related to the kernel
filter. The BPF code generated by tcpdump returns the snapshot size
if the packet should be accepted - even tcpdump without a filter generates
the following BPF code:
# tcpdump -d
(000) ret #68
Nice. So the kernel code trims the packet and recvfrom has no idea what
the original packet size was. This is also the reason why it sometimes
worked. If you had no kernel filter configured the problem just disappears.
How to fix
----------
I think the kernel is at fault here because there is no way to get
the original packet size after the packet went through the filter.
But it is not easy to fix at the kernel level. If we set MSG_TRUNC
whenever the packet was truncated by the filter and always return
skb->realsize (the original packet size) there is a case we can't
handle. Imagine the original packet had 1400 bytes, the filter
cuts that down to 300 bytes and we have a buffer of 500 bytes.
What's the right return code for recvfrom in that case?
We want to know the original packet size (1400) and should set MSG_TRUNC.
But in that case the userspace does not know that only 300 bytes of the
buffer are valid. Of course we can copy 500 bytes anyway since the
packet is still there but is that correct behaviour?
I plan to change the pcap generated BPF code when installing the filter.
The easy solution is to just change any RET #xx codes to return a
negative value so that the kernel does not touch the packet. It's kind
of a hack though and I do not like hacks ;)
I would appreciate any comments.
Thanks
Torsten
--
Torsten Landschoff Bluehorn@IRC <[EMAIL PROTECTED]>
Debian Developer and Quality Assurance Committee Member
PGP signature