Re: [ovs-dev] [PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-11 Thread Jan Scheurich
Hi Ychen,

Thanks a lot for your tests of corner cases and suggested bug fixes. I will 
include fixes in the next version, possibly also unit test cases for those.

A bucket weight of zero should in my eyes imply no traffic to that bucket. I 
will check how to achieve that.

I will also look into your ofproto_group_unref question.

Regards, Jan


From: ychen [mailto:ychen103...@163.com]
Sent: Wednesday, 11 April, 2018 06:16
To: Jan Scheurich 
Cc: d...@openvswitch.org; Nitin Katiyar 
Subject: Re:[PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for 
select groups

Hi, Jan:
When I test dp_hash with the new patch, vswitchd was killed by segment 
fault in some conditions.
1. add group with no buckets, then winner will be NULL
2. add buckets with weight with 0, then winner will also be NULL

I did little modify to the patch, will you help to check whether it is correct?

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 8f6070d..b3a9639 100755
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4773,6 +4773,8 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 webster[i].value = bucket->weight;
 i++;
 }
+//consider bucket weight equal to 0
+if (!min_weight) min_weight = 1;

 uint32_t min_slots = ceil(total_weight / min_weight);
 n_hash = MAX(16, 1L << log_2_ceil(min_slots));
@@ -4794,11 +4796,12 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 for (int hash = 0; hash < n_hash; hash++) {
 VLOG_DBG("Hash value: %d", hash);
 double max_val = 0.0;
-struct webster *winner;
+struct webster *winner = NULL;
 for (i = 0; i < n_buckets; i++) {
 VLOG_DBG("Webster[%d]: divisor=%d value=%.2f",
  i, webster[i].divisor, webster[i].value);
-if (webster[i].value > max_val) {
+// use >= in condition there is only one bucket with weight 0
+if (webster[i].value >= max_val) {
 max_val = webster[i].value;
 winner = [i];
 }
@@ -4827,7 +4830,8 @@ group_set_selection_method(struct group_dpif *group)
 group->selection_method = SEL_METHOD_DEFAULT;
 } else if (!strcmp(selection_method, "dp_hash")) {
 /* Try to use dp_hash if possible at all. */
-if (group_setup_dp_hash_table(group, 64)) {
+uint32_t n_buckets = group->up.n_buckets;
+if (n_buckets && group_setup_dp_hash_table(group, 64)) {
 group->selection_method = SEL_METHOD_DP_HASH;
 group->hash_alg = props->selection_method_param >> 32;
 if (group->hash_alg >= __OVS_HASH_MAX) {


Another question, I found in function xlate_default_select_group and 
xlate_hash_fields_select_group,
when group_best_live_bucket is NULL, it will call ofproto_group_unref,
why dp_hash function no need to call it when there is no best bucket 
found?(exp: group with no buckets)



At 2018-03-21 02:16:17, "Jan Scheurich" 
> wrote:

>The current implementation of the "dp_hash" selection method suffers

>from two deficiences: 1. The hash mask and hence the number of dp_hash

>values is just large enough to cover the number of group buckets, but

>does not consider the case that buckets have different weights. 2. The

>xlate-time selection of best bucket from the masked dp_hash value often

>results in bucket load distributions that are quite different from the

>bucket weights because the number of available masked dp_hash values

>is too small (2-6 bits compared to 32 bits of a full hash in the default

>hash selection method).

>

>This commit provides a more accurate implementation of the dp_hash

>select group by applying the well known Webster method for distributing

>a small number of "seats" fairly over the weighted "parties"

>(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).

>The dp_hash mask is autmatically chosen large enough to provide good

>enough accuracy even with widely differing weights.

>

>This distribution happens at group modification time and the resulting

>table is stored with the group-dpif struct. At xlation time, we use the

>masked dp_hash values as index to look up the assigned bucket.

>

>If the bucket should not be live, we do a circular search over the

>mapping table until we find the first live bucket. As the buckets in

>the table are by construction in pseudo-random order with a frequency

>according to their weight, this method maintains correct distribution

>even if one or more buckets are non-live.

>

>Xlation is further simplified by storing some derived select group state

>at group construction in struct group-dpif in a form better suited for

>xlation purposes.

>

>Signed-off-by: Jan Scheurich 
>>


Re: [ovs-dev] [PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-04-10 Thread ychen
Hi, Jan:
When I test dp_hash with the new patch, vswitchd was killed by segment 
fault in some conditions.
1. add group with no buckets, then winner will be NULL
2. add buckets with weight with 0, then winner will also be NULL


I did little modify to the patch, will you help to check whether it is correct?


diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 8f6070d..b3a9639 100755
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4773,6 +4773,8 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 webster[i].value = bucket->weight;
 i++;
 }
+//consider bucket weight equal to 0
+if (!min_weight) min_weight = 1;


 uint32_t min_slots = ceil(total_weight / min_weight);
 n_hash = MAX(16, 1L << log_2_ceil(min_slots));
@@ -4794,11 +4796,12 @@ group_setup_dp_hash_table(struct group_dpif *group, 
size_t max_hash)
 for (int hash = 0; hash < n_hash; hash++) {
 VLOG_DBG("Hash value: %d", hash);
 double max_val = 0.0;
-struct webster *winner;
+struct webster *winner = NULL;
 for (i = 0; i < n_buckets; i++) {
 VLOG_DBG("Webster[%d]: divisor=%d value=%.2f",
  i, webster[i].divisor, webster[i].value);
-if (webster[i].value > max_val) {
+// use >= in condition there is only one bucket with weight 0
+if (webster[i].value >= max_val) {
 max_val = webster[i].value;
 winner = [i];
 }
@@ -4827,7 +4830,8 @@ group_set_selection_method(struct group_dpif *group)
 group->selection_method = SEL_METHOD_DEFAULT;
 } else if (!strcmp(selection_method, "dp_hash")) {
 /* Try to use dp_hash if possible at all. */
-if (group_setup_dp_hash_table(group, 64)) {
+uint32_t n_buckets = group->up.n_buckets;
+if (n_buckets && group_setup_dp_hash_table(group, 64)) {
 group->selection_method = SEL_METHOD_DP_HASH;
 group->hash_alg = props->selection_method_param >> 32;
 if (group->hash_alg >= __OVS_HASH_MAX) {




Another question, I found in function xlate_default_select_group and 
xlate_hash_fields_select_group,
when group_best_live_bucket is NULL, it will call ofproto_group_unref,
why dp_hash function no need to call it when there is no best bucket 
found?(exp: group with no buckets)





At 2018-03-21 02:16:17, "Jan Scheurich"  wrote:
>The current implementation of the "dp_hash" selection method suffers
>from two deficiences: 1. The hash mask and hence the number of dp_hash
>values is just large enough to cover the number of group buckets, but
>does not consider the case that buckets have different weights. 2. The
>xlate-time selection of best bucket from the masked dp_hash value often
>results in bucket load distributions that are quite different from the
>bucket weights because the number of available masked dp_hash values
>is too small (2-6 bits compared to 32 bits of a full hash in the default
>hash selection method).
>
>This commit provides a more accurate implementation of the dp_hash
>select group by applying the well known Webster method for distributing
>a small number of "seats" fairly over the weighted "parties"
>(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
>The dp_hash mask is autmatically chosen large enough to provide good
>enough accuracy even with widely differing weights.
>
>This distribution happens at group modification time and the resulting
>table is stored with the group-dpif struct. At xlation time, we use the
>masked dp_hash values as index to look up the assigned bucket.
>
>If the bucket should not be live, we do a circular search over the
>mapping table until we find the first live bucket. As the buckets in
>the table are by construction in pseudo-random order with a frequency
>according to their weight, this method maintains correct distribution
>even if one or more buckets are non-live.
>
>Xlation is further simplified by storing some derived select group state
>at group construction in struct group-dpif in a form better suited for
>xlation purposes.
>
>Signed-off-by: Jan Scheurich 
>Signed-off-by: Nitin Katiyar 
>Co-authored-by: Nitin Katiyar 
>Signed-off-by: Jan Scheurich 
>---
> include/openvswitch/ofp-group.h |   1 +
> ofproto/ofproto-dpif-xlate.c|  70 
> ofproto/ofproto-dpif.c  | 142 
> ofproto/ofproto-dpif.h  |  13 
> 4 files changed, 200 insertions(+), 26 deletions(-)
>
>diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
>index 8d893a5..af4033d 100644
>--- a/include/openvswitch/ofp-group.h
>+++ b/include/openvswitch/ofp-group.h
>@@ -47,6 +47,7 @@ struct bucket_counter {
> /* Bucket for use in groups. */
> 

[ovs-dev] [PATCH 2/3] ofproto-dpif: Improve dp_hash selection method for select groups

2018-03-20 Thread Jan Scheurich
The current implementation of the "dp_hash" selection method suffers
from two deficiences: 1. The hash mask and hence the number of dp_hash
values is just large enough to cover the number of group buckets, but
does not consider the case that buckets have different weights. 2. The
xlate-time selection of best bucket from the masked dp_hash value often
results in bucket load distributions that are quite different from the
bucket weights because the number of available masked dp_hash values
is too small (2-6 bits compared to 32 bits of a full hash in the default
hash selection method).

This commit provides a more accurate implementation of the dp_hash
select group by applying the well known Webster method for distributing
a small number of "seats" fairly over the weighted "parties"
(see https://en.wikipedia.org/wiki/Webster/Sainte-Lagu%C3%AB_method).
The dp_hash mask is autmatically chosen large enough to provide good
enough accuracy even with widely differing weights.

This distribution happens at group modification time and the resulting
table is stored with the group-dpif struct. At xlation time, we use the
masked dp_hash values as index to look up the assigned bucket.

If the bucket should not be live, we do a circular search over the
mapping table until we find the first live bucket. As the buckets in
the table are by construction in pseudo-random order with a frequency
according to their weight, this method maintains correct distribution
even if one or more buckets are non-live.

Xlation is further simplified by storing some derived select group state
at group construction in struct group-dpif in a form better suited for
xlation purposes.

Signed-off-by: Jan Scheurich 
Signed-off-by: Nitin Katiyar 
Co-authored-by: Nitin Katiyar 
Signed-off-by: Jan Scheurich 
---
 include/openvswitch/ofp-group.h |   1 +
 ofproto/ofproto-dpif-xlate.c|  70 
 ofproto/ofproto-dpif.c  | 142 
 ofproto/ofproto-dpif.h  |  13 
 4 files changed, 200 insertions(+), 26 deletions(-)

diff --git a/include/openvswitch/ofp-group.h b/include/openvswitch/ofp-group.h
index 8d893a5..af4033d 100644
--- a/include/openvswitch/ofp-group.h
+++ b/include/openvswitch/ofp-group.h
@@ -47,6 +47,7 @@ struct bucket_counter {
 /* Bucket for use in groups. */
 struct ofputil_bucket {
 struct ovs_list list_node;
+uint16_t aux;   /* Padding. Also used for temporary data. */
 uint16_t weight;/* Relative weight, for "select" groups. */
 ofp_port_t watch_port;  /* Port whose state affects whether this bucket
  * is live. Only required for fast failover
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index bc6429c..021a4d8 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4232,35 +4232,51 @@ xlate_hash_fields_select_group(struct xlate_ctx *ctx, 
struct group_dpif *group,
 }
 }
 
+
+static struct ofputil_bucket *
+group_dp_hash_best_bucket(struct xlate_ctx *ctx,
+  const struct group_dpif *group,
+  uint32_t dp_hash)
+{
+struct ofputil_bucket *bucket, *best_bucket = NULL;
+uint32_t n_hash = group->hash_mask + 1;
+
+uint32_t hash = dp_hash &= group->hash_mask;
+ctx->wc->masks.dp_hash |= group->hash_mask;
+
+/* Starting from the original masked dp_hash value iterate over the
+ * hash mapping table to find the first live bucket. As the buckets
+ * are quasi-randomly spread over the hash values, this maintains
+ * a distribution according to bucket weights even when some buckets
+ * are non-live. */
+for (int i = 0; i < n_hash; i++) {
+bucket = group->hash_map[(hash + i) % n_hash];
+if (bucket_is_alive(ctx, bucket, 0)) {
+best_bucket = bucket;
+break;
+}
+}
+
+return best_bucket;
+}
+
 static void
 xlate_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group,
bool is_last_action)
 {
-struct ofputil_bucket *bucket;
-
 /* dp_hash value 0 is special since it means that the dp_hash has not been
  * computed, as all computed dp_hash values are non-zero.  Therefore
  * compare to zero can be used to decide if the dp_hash value is valid
  * without masking the dp_hash field. */
 if (!ctx->xin->flow.dp_hash) {
-uint64_t param = group->up.props.selection_method_param;
-
-ctx_trigger_recirculate_with_hash(ctx, param >> 32, (uint32_t)param);
+ctx_trigger_recirculate_with_hash(ctx, group->hash_alg,
+  group->hash_basis);
 } else {
-uint32_t n_buckets = group->up.n_buckets;
-if (n_buckets) {
-/* Minimal mask to cover the number of buckets. */
-