Baruch Siach <[email protected]> wrote on 2011/09/07 09:46:54:
>
> Hi Joakim,
>
> On Wed, Sep 07, 2011 at 09:05:36AM +0200, Joakim Tjernlund wrote:
> > Baruch Siach <[email protected]> wrote on 2011/09/07 06:29:32:
> > > Hi Joakim,
> > > On Mon, Sep 05, 2011 at 10:09:14PM +0200, Joakim Tjernlund wrote:
> > > > > From: Baruch Siach <[email protected]>
> > > > >
> > > > > Use a single IP checksum routine for ping, traceroute and udhcp.
> > > > >
> > > > > Signed-off-by: Baruch Siach <[email protected]>
> > > > > ---
> > > > >
> > > > > Changes from v1:
> > > > > Fix inet_cksum() with odd length on big endian systems
> > > > > Remove declaration of removed udhcp_checksum()
> > > >
> > > > This looks like it is from quagga. I cleaned that up/optimized it
> > > > some years ago into this(you might want to fix the types):
> > >
> > > What is the advantage of this implementation over the one I suggested
> > > (which I
> > > took from the traceroute code)?
> >
> > It is cleaner.
>
> Which part of it looks cleaner to you? The two routine look almost the same to
> me.
This part:
count = nbytes >> 1; /* div by 2 */
for(ptr--; count; --count)
sum += *++ptr;
if (nbytes & 1) /* Odd */
sum += *(u_char *)(++ptr); /* one byte only */
vs.
while (nleft > 1) {
sum += *w++;
nleft -= 2;
}
/* mop up an odd byte, if necessary */
if (nleft == 1) {
/* Make sure that the left-over byte is added correctly both
* with little and big endian hosts */
uint16_t tmp = 0;
*(uint8_t*)&tmp = *(uint8_t*)w;
sum += tmp;
}
>
> > Optimizes better(at least on RISC like archs).
>
> I get the following 'readelf -s' results for different toolchains with '-Os':
>
> x86_64 on current Debian testing (gcc 4.6.1):
> 8: 0000000000000000 78 FUNC GLOBAL DEFAULT 1 in_cksum
> 9: 000000000000004e 66 FUNC GLOBAL DEFAULT 1 inet_cksum
>
> ARM with the CodeSourcery 2010q1 toolchain (gcc 4.4.1):
> 12: 00000000 100 FUNC GLOBAL DEFAULT 1 in_cksum
> 14: 00000064 104 FUNC GLOBAL DEFAULT 1 inet_cksum
>
> PowerPC with the CodeSourcery 2010.09 toolchain (gcc 4.5.1):
> 8: 00000000 96 FUNC GLOBAL DEFAULT 1 in_cksum
> 9: 00000060 124 FUNC GLOBAL DEFAULT 1 inet_cksum
>
> I don't see a definite winner here.
I do. Both ARM and PPC is smaller(PPC is quite much smaller).
You could try change it to
count = nbytes >> 1; /* div by 2 */
for(; count; --count)
sum += *ptr++;
if (nbytes & 1) /* Odd */
sum += *(u_char *)ptr; /* one byte only */
and see if that makes it any better for x86. Problem is that gcc isn't very good
at optimizing code. However gcc should be better at x86 than any other arch.
Jocke
>
> > I am not sure if this is allowed on modern gcc's:
> > *(uint8_t*)&tmp = *(uint8_t*)w;
>
> gcc with '-Wall -Wextra' doesn't complain.
OK, but the extra casting doesn't look pretty.
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox