I'm going to sum up what I've discoverd during investigating this
bugreport.

Issue

ifconfig and iproute2 (sometimes) shows different numbers for network
statistics (like send and received packets/bytes/etc).

Cause

The statistics are stored as "unsigned long" in the kernel. Variables of
type "unsigned long" are eigther 32 or 64bits large depending on
architecture. This makes the problem only expose itself with a 64bit
kernel (no matter if the userspace is 32 or 64 bits in this case).
ifconfig gets it's statistics from /proc/net/dev where it's exposed as
_text_.
iproute2 gets it's statistics from a binary "netlink" interface.
When exporting the statistics binary there's a problem between
kernel/userspace about agreeing how large an "unsigned long" is, since
you can run a 64bit kernel (where the unsigned long then would be
64bits) and a 32bit userland (where it would be 32 bits and would
overflow).
The kernel has an static-sized unsigned 32bits variable type called
"u32" to avoid these kind of problems.
For some reason (like not bloating 32bit achitectures?) the u32 variable
type was used in the netlink interface that iproute2 uses instead of an
"u64" which would have enough room on both types of architectures (but
would be useless/wasteful on 32bit ones). This makes the number that
iproute2 gets it's hands on to always have rolled over at 32bits even if
the kernels unsigned long is 64 bits.


Problem

There really isn't a problem. Any program using these statistics needs
to cope with rollovers, which will (eventually) happen for both 32bits
and 64bits statistics.
It's however unfortunate that ifconfig and iproute2 differ in the
statistics which will be confusing for people not aware of the internals
behind this. The numbers between ifconfig and iproute2 can't be compared
(unless the ifconfig numbers are post-processed to rollover at 32bits as
well).

Solution(s)

Even though this isn't really a bug and ifconfig is supposed to be
deprecated since a long time on linux, it would preferably be handled
anyway since ifconfig hasn't and (unfortunately) most likely will not go
away in the near future to not cause the confusion for people comparing
the ifconfig and iproute2 output. 
There are two ways of doing this, the easy way and the hard (proper?)
way.
The easy way is just to force the /proc/net/dev output to rollover at
32bits as well, even though it's not really necessary in itself. This
way both methods would roll over at 32bits and the confusion wouldn't
occur. Unfortunately changing even the smallest thing in /proc files has
shown to cause breakage in userland code in the past and one could never
really be sure what problems this change would have on all possible
applications that might use /proc/net/dev.
The hard way is to add a 64bit (u64) variant of the statistics to
kernels netlink interface and add the abitily in iproute2 to use this if
the kernel provides it instead of the old/current 32bits interface.

Both of these methods depend on modifications in the kernel!

Additionally, one could argue that the bug really is in the kernel and
that it's a feature request / wishlish for iproute2 to support this
non-yet-existing kernel function. Or that this is just as much a bug in
ifconfig. Just because iproute2 and ifconfig states differently doesn't
mean it's the fault of iproute2. If you would want to shift the blame
towards ifconfig, you could use the fact that it could even be "fixed"
in ifconfig without requiring kernel modification (but then there are
probably many other programs that use the /proc/net/dev values and would
require the same "fix").
Since I've already submitted[1] a patch for "the simple solution" to the
(linux) netdev mailing list, and noone cared to comment or apply it. I
guess they would prefer a nice backwards-compatible implementation of
"the hard solution", or probably even that this is such a small issue
that it's not worth fixing. Possibly the /proc filesystem is going to be
cleaned up one day in a distant future to remove all the cruft that is
not process-related (and break all applications, like ifconfig, that
depends on these deprecated methods of gathering kernel information).

I'm suggesting documenting the behaviour (unless this bug report doesn't
count as good enough documentation) and lowering the severity to
wishlist. Most likely noone will care to fix this since there's so
little to gain.


[1]: See http://www.spinics.net/lists/netdev/msg35472.html or
http://marc.info/?l=linux-netdev&m=118415534518953


-- 
Regards,
Andreas Henriksson

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to