On 24.04.2013 23:09, Luigi Rizzo wrote:
On Wed, Apr 24, 2013 at 08:46:01PM +0400, Alexander V. Chernikov wrote:
On 24.04.2013 20:23, Luigi Rizzo wrote:
...
vesrion) in the middle of the next week.
hmmm.... this is quite a large change, and from the description it
is a bit unclear to me how the "opcode rewriting" thing relates to
the use of strings vs index for name matching.
sorry, I havent't describe this explicitly.
Index matching is done via storing interface index in in p.glob field of
ipfw_insn_if instruction.
understood. the reasons why i did not use the index is that
one could specify a non-existing interface name, and also interfaces
can be renamed. If you want to use indexses, you should add
(perhaps you do, i haven't checked)
Yes, this is done (without 'good' renaming handling), but still.
hooks to the interface add/rename/delete code in order to
update the ruleset upon changes on the if list, and it
seemed to me a bad idea to add this dependency
(lockingwise, too).

Really, with 16-byte fixed size interface names, the match
is as simple as this:

     #if CAN_DO_FAST_MATCH && IFNAMSIZ == 16 /* archs with no align 
requirements */
        {
                uint64_t *a = (uint64_t *)ifp->if_xname;
                uint64_t *b = (uint64_t *)cmd->name;
                if (a[0] == b[0] && a[1] == b[1])
                        return 1;
        }
     #else
        if (strncmp(ifp->if_xname, cmd->name, IFNAMSIZ) == 0)
                return 1
     #endif

(assuming the names are zero-padded, which should be the case already).
Since you have the measurement infrastructure in place, perhaps
you have an easy way to try this patch and see how effective
it is in terms of performance.
I'll try this tomorrow, thanks.

Additionally, i wonder if there isn't a better way to replace strncmp
with some two 64-bit comparisons (the name is 16 bytes) by making
sure that the fields are zero-padded and suitably aligned.
At this point, on the machines you care about (able to sustain
1+ Mpps) the two comparison should have the same cost as
the index comparison, without the need to track update in the names.
Well, actually I'm thinking of the next 2 steps:
1) making kernel rule header more compact (20 bytes instead of 48) and
making it invisible for userland.
This involves rule counters to be stored separately (and possibly as
pcpu-based ones).
2) since ruleset is now nearly readonly and more or less compact we can
try to store it in
contiguous address space to optimize cache line usage.
certainly a worthwhile goal (also using gleb's new counters)
but i suspect that compacting rules are a second order effect.
I a bit skeptical they make a big difference on the in-kernel
version of ipfw. You might see some difference in the
My current numbers are ~5mpps of IPv4 forwarding with ipfw turned on (1 rule) for vlans over ixgbe, with 60% cpu usage (2xE5646).
For lagg with 2x ixgbe it is ~7mpps with the same 60% usage.
(And, say, 70% of CPU usage on our production is ipfw, despite low number of rules).
userspace version, which runs on top of netmap.
We are preparing to move forward in this direction (and thinking of 20-30mpps as our goal). (And I hope some changes of kernel-based version can migrate to userland one :))

cheers
luigi


_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw
To unsubscribe, send any mail to "[email protected]"

Reply via email to