"Justin C. Darby" <[EMAIL PROTECTED]> writes:

> It shouldn't be too  bad, actually, it's just something that came to mind.
> When you're processing ~500,000 packets per second, every cpu instruction
> counts. :)

Fair point! If you use the socketfilter patch, the new bpf program should let
through precisely the set of packets that this vblade instance should process.
The previous program (only available in freebsd.c) wasn't quite so strict.) As
a result, you ought to be safe to comment out the whole check block:

                if (ntohs(p->type) != 0x88a2)
                        continue;
                if (p->flags & Resp)
                        continue;
                sh = ntohs(p->maj);
                if (sh != shelf && sh != (ushort)~0)
                        continue;
                if (p->min != slot && p->min != (uchar)~0)
                        continue;
                if (nmasks && !maskok(p->src))
                        continue;

to see if that makes any real performance difference. (However, if you do
this you definitely also want to make vblade fail rather than silently
continue if the SO_ATTACH_FILE setsockopt returns non-zero in linux.c: if
the BPF fails to attach and you comment out the checks, you could end up
scribbling over your disks with the contents of other ethernet traffic!)

As a very crude experiment, I just compiled up a static vblade, ran it on a
loopback interface, and tried dding heavily from it with oprofile running. As
expected, almost all the work is done by the kernel, but looking at the symbol
report and annotated source, getlba seems to do a surprising amount of work and
could be an easy candidate for optimisation. Even just a simple-minded loop
unroll simultaneously takes getlba from 7.2% to 3.0% of total vblade CPU time
whilst reducing the code-length:

   long long
   getlba(uchar *p)
   {
  -       vlong v;
  -       int i;
  -
  -       v = 0;
  -       for (i = 0; i < 6; i++)
  -               v |= (vlong)(*p++) << i * 8;
  -       return v;
  +       return ((vlong) p[0]) + ((vlong) p[1]) << 8
  +                             + ((vlong) p[2]) << 16
  +                             + ((vlong) p[3]) << 24
  +                             + ((vlong) p[4]) << 32
  +                             + ((vlong) p[5]) << 40;
   }

Apart from getlba, the other high-cpu functions are less surprising:

  samples  %        symbol name
  8717     20.0170  atacmd
  8261     18.9699  memmove
  6222     14.2877  doaoe
  5642     12.9558  aoeata
  4195      9.6330  do_pread64
  3974      9.1256  aoe

However, there's probably scope for doing much more detailed profiling work
on a heavily loaded system.

Cheers,

Chris.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Aoetools-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/aoetools-discuss

Reply via email to