https://patchwork.ozlabs.org/patch/1052369/

On 3/6/19 8:59 AM, Mark Michelson wrote:
OK, I think I see what the problem is now.

The issue is with the order that ovn_ports are visited in join_logical_ports. If a logical switch port is connected to a logical router port, then if the logical switch port is visited first, everything works fine. If the logical router port is visited first, then it can result in duplicate addresses being assigned. As you pointed out, the hash function on a big endian system will be different, so ports are being visited in a different order on that system. However, this could just as easily occur on a little endian system as well, depending on the values being computed by the hash function.

I will submit a fix as soon as possible.
Thanks,
Mark

On 3/6/19 8:45 AM, Mark Michelson wrote:
Hi Ben,

Thanks for reaching out. Right now it's not obvious why big endian architecture would cause duplicate IPs to be assigned. My initial thought was that the wrong byte ordering was being used when adding IP addresses to IPAM, but it appears that host byte ordering is being used consistently when performing the calculations.

James, is that the only IPAM-related test failure you are seeing? If so, that can help narrow where I need to look to fix this. Thanks.

On 3/5/19 9:04 PM, Ben Pfaff wrote:
[adding some folks who've worked on IPAM lately]

On Tue, Mar 05, 2019 at 10:26:58AM +0000, James Page wrote:
Test 2768 is consistently failing on s390x in Ubuntu development.

Note that this is the only big-endian architecture we build for so I would
suspect something endian-y as the root cause.

Yes, it's related to big-endian, in particular related to the OVS hash
function, which produces different results depending on endianness.
I can reproduce it on amd64 by changing the hash function:

----------------------------------------------------------------------
diff --git a/lib/hash.h b/lib/hash.h
index eb3776500abe..c7fc6430c71d 100644
--- a/lib/hash.h
+++ b/lib/hash.h
@@ -1,5 +1,5 @@
  /*
- * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2014, 2016 Nicira, Inc.
+ * Copyright (c) 2008, 2009, 2010, 2012, 2013, 2014, 2016, 2019 Nicira, Inc.
   *
   * Licensed under the Apache License, Version 2.0 (the "License");
   * you may not use this file except in compliance with the License.
@@ -87,6 +87,7 @@ static inline uint32_t mhash_finish(uint32_t hash)
      hash ^= hash >> 13;
      hash *= 0xc2b2ae35;
      hash ^= hash >> 16;
+    hash = ~hash;
      return hash;
  }
@@ -184,7 +185,7 @@ static inline uint32_t hash_finish(uint64_t hash, uint64_t final)
      /* The finishing multiplier 0x805204f3 has been experimentally
       * derived to pass the testsuite hash tests. */
      hash = _mm_crc32_u64(hash, final) * 0x805204f3;
-    return hash ^ (uint32_t)hash >> 16; /* Increase entropy in LSBs. */
+    return ~(hash ^ (uint32_t)hash >> 16); /* Increase entropy in LSBs. */
  }
  /* Returns the hash of the 'n' 32-bit words at 'p_', starting from 'basis'.
----------------------------------------------------------------------

The actual failure might be due to a more serious bug though.  When I
apply the following patch:

----------------------------------------------------------------------
diff --git a/tests/ovn.at b/tests/ovn.at
index 2af225a678e1..a53204f544e9 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -12273,6 +12273,8 @@ for i in 2 3 4; do
      ovn-nbctl --wait=sb lsp-set-addresses swp$i "02:00:00:00:00:0$i dynamic"       cidr=$(ovn-nbctl get logical_switch_port swp$i dynamic_addresses |cut -f2 -d' '|cut -f1 -d\")       ovn-nbctl lrp-add ro$i rop$i 02:00:00:00:00:0$i $cidr/24 -- set logical_switch_port swp$i type=router options:router-port=rop$i addresses=router;
+    printf "\nadded swp$i\n"
+    ovn-nbctl list logical_switch_port
      AT_CHECK_UNQUOTED([ovn-nbctl get logical_router_port rop$i networks], [0], [[["192.168.1.$i/24"]]
  ])
  done
----------------------------------------------------------------------

the test then reports that ports swp2 and swp3 are both being assigned
dynamic address 192.168.1.2 (instead of 192.168.1.2 and 192.168.1.3,
respectively), as you can see from the "dynamic_addresses" lines
following "added swp3":

----------------------------------------------------------------------
2772. ovn.at:12264: testing ovn -- ipam router ports ...
creating ovn-sb database
creating ovn-nb database
starting ovn-northd
starting backup ovn-northd

added swp2
_uuid               : d78add30-9d8e-4d49-bea6-8f4bc947ab3b
addresses           : [router]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : "02:00:00:00:00:02 192.168.1.2"
enabled             : []
external_ids        : {}
name                : "swp2"
options             : {router-port="rop2"}
parent_name         : []
port_security       : []
tag                 : []
tag_request         : []
type                : router
up                  : true
../../tests/ovn.at:12278: ovn-nbctl get logical_router_port rop$i networks

added swp3
_uuid               : d78add30-9d8e-4d49-bea6-8f4bc947ab3b
addresses           : [router]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : "02:00:00:00:00:02 192.168.1.2"
enabled             : []
external_ids        : {}
name                : "swp2"
options             : {router-port="rop2"}
parent_name         : []
port_security       : []
tag                 : []
tag_request         : []
type                : router
up                  : true

_uuid               : 0542f623-43a8-407d-adbe-0927999c78b1
addresses           : [router]
dhcpv4_options      : []
dhcpv6_options      : []
dynamic_addresses   : "02:00:00:00:00:03 192.168.1.2"
enabled             : []
external_ids        : {}
name                : "swp3"
options             : {router-port="rop3"}
parent_name         : []
port_security       : []
tag                 : []
tag_request         : []
type                : router
up                  : true
../../tests/ovn.at:12278: ovn-nbctl get logical_router_port rop$i networks
--- -   2019-03-05 17:58:01.066306125 -0800
+++ /home/blp/nicira/ovs/_build/tests/testsuite.dir/at-groups/2772/stdout 2019-03-05 17:58:01.060040470 -0800
@@ -1,2 +1,2 @@
-["192.168.1.3/24"]
+["192.168.1.2/24"]
----------------------------------------------------------------------

Mark and Lorenzo, this seems like something you should look into?

Thanks,

Ben.




_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to