Re: [ovs-dev] [PATCH v9 4/5] dpif-netdev: refactor generic implementation
Not a full review. See inline. Best regards, Ilya Maximets. On 08.05.2019 18:13, Harry van Haaren wrote: > This commit refactors the generic implementation. The > goal of this refactor is to simply the code to enable > "specialization" of the functions at compile time. > > Given compile-time optimizations, the compiler is able > to unroll loops, and create optimized code sequences due > to compile time knowledge of loop-trip counts. > > In order to enable these compiler optimizations, we must > refactor the code to pass the loop-trip counts to functions > as compile time constants. > > This patch allows the number of miniflow-bits set per "unit" > in the miniflow to be passed around as a function argument. > > Note that this patch does NOT yet take advantage of doing so, > this is only a refactor to enable it in the next patches. > > Signed-off-by: Harry van Haaren > > --- > > v9: > - Use count_1bits in favour of __builtin_popcount (Ilya) > - Use ALWAYS_INLINE instead of __attribute__ synatx (Ilya) > > v8: > - Rework block_cache and mf_masks to avoid variable-lenght array > due to compiler issues. Provisioning for worst case is not a > good solution due to magnitue of over-provisioning required. > - Rework netdev_flatten function removing unused parameter > --- > lib/dpif-netdev-lookup-generic.c | 239 --- > lib/dpif-netdev.c| 79 +- > lib/dpif-netdev.h| 20 ++- > 3 files changed, 283 insertions(+), 55 deletions(-) > > diff --git a/lib/dpif-netdev-lookup-generic.c > b/lib/dpif-netdev-lookup-generic.c > index 2e4003408..901d28ff0 100644 > --- a/lib/dpif-netdev-lookup-generic.c > +++ b/lib/dpif-netdev-lookup-generic.c > @@ -28,67 +28,204 @@ > #include "packets.h" > #include "pvector.h" > > -/* Returns a hash value for the bits of 'key' where there are 1-bits in > - * 'mask'. */ > -static inline uint32_t > -netdev_flow_key_hash_in_mask(const struct netdev_flow_key *key, > - const struct netdev_flow_key *mask) > +VLOG_DEFINE_THIS_MODULE(dpif_lookup_generic); > + > +/* netdev_flow_key_flatten_unit: > + * Given a packet, table and mf_masks, this function iterates over each bit > + * set in the subtable, and calculates the appropriate metadata to store in > the > + * blocks_scratch[]. > + * > + * The results of the blocks_scratch[] can be used for hashing, and later for > + * verification of if a rule matches the given packet. > + */ > +static inline void > +netdev_flow_key_flatten_unit(const uint64_t * restrict pkt_blocks, > + const uint64_t * restrict tbl_blocks, > + const uint64_t * restrict mf_masks, > + uint64_t * restrict blocks_scratch, > + const uint64_t pkt_mf_bits, > + const uint32_t count) > { > -const uint64_t *p = miniflow_get_values(&mask->mf); > -uint32_t hash = 0; > -uint64_t value; > +uint32_t i; > +for (i = 0; i < count; i++) { > +uint64_t mf_mask = mf_masks[i]; > +/* Calculate the block index for the packet metadata */ > +uint64_t idx_bits = mf_mask & pkt_mf_bits; > +const uint32_t pkt_idx = count_1bits(idx_bits); > > -NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP (value, key, mask->mf.map) { > -hash = hash_add64(hash, value & *p); > -p++; > +/* check if the packet has the subtable miniflow bit set. If yes, the > + * block at the above pkt_idx will be stored, otherwise it is masked > + * out to be zero. > + */ > +uint64_t pkt_has_mf_bit = (mf_mask + 1) & pkt_mf_bits; > +uint64_t no_bit = ((!pkt_has_mf_bit) > 0) - 1; > + > +/* mask packet block by table block, and mask to zero if packet > + * doesn't actually contain this block of metadata > + */ > +blocks_scratch[i] = pkt_blocks[pkt_idx] & tbl_blocks[i] & no_bit; > } > +} > + > +/* netdev_flow_key_flatten: > + * This function takes a packet, and subtable and writes an array of uint64_t > + * blocks. The blocks contain the metadata that the subtable matches on, in > + * the same order as the subtable, allowing linear iteration over the blocks. > + * > + * To calculate the blocks contents, the netdev_flow_key_flatten_unit > function > + * is called twice, once for each "unit" of the miniflow. This call can be > + * inlined by the compiler for performance. > + * > + * Note that the u0_count and u1_count variables can be compile-time > constants, > + * allowing the loop in the inlined flatten_unit() function to be > compile-time > + * unrolled, or possibly removed totally by unrolling by the loop iterations. > + * The compile time optimizations enabled by this design improves > performance. > + */ > +static inline void > +netdev_flow_key_flatten(const struct netdev_flow_key * restrict key, > +const struct netdev_flow_
[ovs-dev] [PATCH v9 4/5] dpif-netdev: refactor generic implementation
This commit refactors the generic implementation. The goal of this refactor is to simply the code to enable "specialization" of the functions at compile time. Given compile-time optimizations, the compiler is able to unroll loops, and create optimized code sequences due to compile time knowledge of loop-trip counts. In order to enable these compiler optimizations, we must refactor the code to pass the loop-trip counts to functions as compile time constants. This patch allows the number of miniflow-bits set per "unit" in the miniflow to be passed around as a function argument. Note that this patch does NOT yet take advantage of doing so, this is only a refactor to enable it in the next patches. Signed-off-by: Harry van Haaren --- v9: - Use count_1bits in favour of __builtin_popcount (Ilya) - Use ALWAYS_INLINE instead of __attribute__ synatx (Ilya) v8: - Rework block_cache and mf_masks to avoid variable-lenght array due to compiler issues. Provisioning for worst case is not a good solution due to magnitue of over-provisioning required. - Rework netdev_flatten function removing unused parameter --- lib/dpif-netdev-lookup-generic.c | 239 --- lib/dpif-netdev.c| 79 +- lib/dpif-netdev.h| 20 ++- 3 files changed, 283 insertions(+), 55 deletions(-) diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c index 2e4003408..901d28ff0 100644 --- a/lib/dpif-netdev-lookup-generic.c +++ b/lib/dpif-netdev-lookup-generic.c @@ -28,67 +28,204 @@ #include "packets.h" #include "pvector.h" -/* Returns a hash value for the bits of 'key' where there are 1-bits in - * 'mask'. */ -static inline uint32_t -netdev_flow_key_hash_in_mask(const struct netdev_flow_key *key, - const struct netdev_flow_key *mask) +VLOG_DEFINE_THIS_MODULE(dpif_lookup_generic); + +/* netdev_flow_key_flatten_unit: + * Given a packet, table and mf_masks, this function iterates over each bit + * set in the subtable, and calculates the appropriate metadata to store in the + * blocks_scratch[]. + * + * The results of the blocks_scratch[] can be used for hashing, and later for + * verification of if a rule matches the given packet. + */ +static inline void +netdev_flow_key_flatten_unit(const uint64_t * restrict pkt_blocks, + const uint64_t * restrict tbl_blocks, + const uint64_t * restrict mf_masks, + uint64_t * restrict blocks_scratch, + const uint64_t pkt_mf_bits, + const uint32_t count) { -const uint64_t *p = miniflow_get_values(&mask->mf); -uint32_t hash = 0; -uint64_t value; +uint32_t i; +for (i = 0; i < count; i++) { +uint64_t mf_mask = mf_masks[i]; +/* Calculate the block index for the packet metadata */ +uint64_t idx_bits = mf_mask & pkt_mf_bits; +const uint32_t pkt_idx = count_1bits(idx_bits); -NETDEV_FLOW_KEY_FOR_EACH_IN_FLOWMAP (value, key, mask->mf.map) { -hash = hash_add64(hash, value & *p); -p++; +/* check if the packet has the subtable miniflow bit set. If yes, the + * block at the above pkt_idx will be stored, otherwise it is masked + * out to be zero. + */ +uint64_t pkt_has_mf_bit = (mf_mask + 1) & pkt_mf_bits; +uint64_t no_bit = ((!pkt_has_mf_bit) > 0) - 1; + +/* mask packet block by table block, and mask to zero if packet + * doesn't actually contain this block of metadata + */ +blocks_scratch[i] = pkt_blocks[pkt_idx] & tbl_blocks[i] & no_bit; } +} + +/* netdev_flow_key_flatten: + * This function takes a packet, and subtable and writes an array of uint64_t + * blocks. The blocks contain the metadata that the subtable matches on, in + * the same order as the subtable, allowing linear iteration over the blocks. + * + * To calculate the blocks contents, the netdev_flow_key_flatten_unit function + * is called twice, once for each "unit" of the miniflow. This call can be + * inlined by the compiler for performance. + * + * Note that the u0_count and u1_count variables can be compile-time constants, + * allowing the loop in the inlined flatten_unit() function to be compile-time + * unrolled, or possibly removed totally by unrolling by the loop iterations. + * The compile time optimizations enabled by this design improves performance. + */ +static inline void +netdev_flow_key_flatten(const struct netdev_flow_key * restrict key, +const struct netdev_flow_key * restrict mask, +const uint64_t * restrict mf_masks, +uint64_t * restrict blocks_scratch, +const uint32_t u0_count, +const uint32_t u1_count) +{ +/* load mask from subtable, mask with packet mf, popcount to get idx */ +const uint64_t *pkt_b