[ovs-discuss] the raft_is_connected state of a raft server stays as false and cannot recover

2020-08-13 Thread Yun Zhou
Hi,

Need expert's view to address a problem we are seeing now and then:  A 
ovsdb-server node in a 3-nodes raft cluster keeps printing out the 
"raft_is_connected: false" message, and its "connected" state in its _Server DB 
stays as false.

According to the ovsdb-server(5) manpage, it means this server is not 
contacting with a majority of its cluster.

Except its "connected" state, from what we can see, this server is in the 
follower state and works fine, and connection between it and the other two 
servers appear healthy as well.

Below is its raft structure snapshot at the time of the problem. Note that its 
candidate_retrying field stays as true.

Hopefully the provide information can help to figure out what goes wrong here. 
Unfortunately we don't have a solid case to reproduce it:

(gdb) print *(struct raft *)0xa872c0
$19 = {
  hmap_node = {
hash = 2911123117,
next = 0x0
  },
  log = 0xa83690,
  cid = {
parts = {2699238234, 2258650653, 3035282424, 813064186}
  },
  sid = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  local_address = 0xa874e0 "tcp:10.8.51.55:6643",
  local_nickname = 0xa876d0 "3fdb",
  name = 0xa876b0 "OVN_Northbound",
  servers = {
buckets = 0xad4bc0,
one = 0x0,
mask = 3,
n = 3
  },
  election_timer = 1000,
  election_timer_new = 0,
  term = 3,
  vote = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  synced_term = 3,
  synced_vote = {
parts = {1071328836, 400573240, 2626104521, 1746414343}
  },
  entries = 0xbf0fe0,
  log_start = 2,
  log_end = 312,
  log_synced = 311,
  allocated_log = 512,
  snap = {
term = 1,
data = 0xaafb10,
eid = {
  parts = {1838862864, 1569866528, 2969429118, 3021055395}
},
servers = 0xaafa70,
election_timer = 1000
  },
  role = RAFT_FOLLOWER,
  commit_index = 311,
  last_applied = 311,
  leader_sid = {
parts = {642765114, 43797788, 2533161504, 3088745929}
  },
  election_base = 6043283367,
  election_timeout = 6043284593,
  joining = false,
  remote_addresses = {
map = {
  buckets = 0xa87410,
  one = 0xa879c0,
  mask = 0,
  n = 1
}
  },
  join_timeout = 6037634820,
  leaving = false,
  left = false,
  leave_timeout = 0,
  failed = false,
  waiters = {
prev = 0xa87448,
next = 0xa87448
  },
  listener = 0xaafad0,
  listen_backoff = -9223372036854775808,
  conns = {
prev = 0xbcd660,
next = 0xaafc20
  },
  add_servers = {
buckets = 0xa87480,
one = 0x0,
mask = 0,
n = 0
  },
  remove_server = 0x0,
  commands = {
buckets = 0xa874a8,
one = 0x0,
mask = 0,
n = 0
  },
  ping_timeout = 6043283700,
  n_votes = 1,
  candidate_retrying = true,
  had_leader = false,
  ever_had_leader = true
}

Thanks
- Yun
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] not-equal in ACL

2020-08-13 Thread Tony Liu
> -Original Message-
> From: discuss  On Behalf Of Tony
> Liu
> Sent: Monday, August 10, 2020 10:41 AM
> To: Numan Siddique 
> Cc: ovs-discuss@openvswitch.org
> Subject: [ovs-discuss] [OVN] not-equal in ACL
> 
> Hi Numan,
> 
> Create a new thread here to follow up ACL questions.
> 
> > > > I think this is a big problem here. We should not use "!=" in
> > > > logical flows, although OVN allows.
> > >
> > > Is this a generic recommendation or for certain cases?
> > > Is it OK to add an ACL with "!=", like below?
> > >
> > > ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport
> 1005
> > > 'ip4.dst == 192.168.0.0/16 && inport !=
> > > "d93619c3-dab9-4f6d-8261-4211f6937fd1"' drop
> >
> >
> > This is a generic recommendation. The above ACL would also result in
> > many OF flows.
> >
> > To handle cases like above, you can add a couple of ACLs like below
> with
> > high priority flow to allow the desired inport and low priority ACL to
> > drop all the traffic.
> >
> >  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d from-lport
> > 1006 'ip4.dst == 192.168.0.0/16 && inport == "d93619c3-dab9-4f6d-8261-
> > 4211f6937fd1"' allow  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-
> > e780d9dfdc0d from-lport
> > 1005 'ip4.dst == 192.168.0.0/16"' drop
> 
> In my case, two LS connect to one LR who has external access.
> There are 3 ports on each LS.
> * vm_port
> * gw_port (connect to LR)
> * svc_port (localport for DHCP and metadata)
> 
> What I want is to disable the connection between two LS while allow
> external access for them.
> 
> Option #1, create one ACL for each VM on each LS.
> 
> acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport ==
> "$vm_port"' drop
> 
> This works fine for me, but the ACL has to be per VM.
> 
> Option #2, create one ACL to exclude gw_port and svc_port.
> 
> acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport !=
> "$gw_port" && inport != "svc_port"' drop
> 
> As you mentioned, this is not recommended, cause it will result many
> OF flows. I actually tried, but I don't see any OF flows created for
> that ACL. Is there any policy in ovn-controller to not translate such
> policy to OF flow?
> 
> Option #3, as you suggested, I tried 2 ACLs.
> 
> acl-add $ls from-lport 1006 'ip4.dst == 192.168.0.0/16 && (inport ==
> "$gw_port" || inport == "svc_port")' allow
> acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16' drop
> 
> On compute node, I see the "drop" OF flow only, not the "allow" flow.
> Am I missing anything here?

Hi Numan,

This works! The '$' was missing from "svc_port"!

Thanks for the advice!

Tony
> 
> 
> Thanks!
> 
> Tony
> 
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] not-equal in ACL

2020-08-13 Thread Numan Siddique
On Thu, Aug 13, 2020 at 4:08 AM Tony Liu  wrote:
>
> > -Original Message-
> > From: Numan Siddique 
> > Sent: Wednesday, August 12, 2020 10:25 AM
> > To: Tony Liu 
> > Cc: ovs-discuss@openvswitch.org
> > Subject: Re: [ovs-discuss] [OVN] not-equal in ACL
> >
> > On Wed, Aug 12, 2020 at 10:41 PM Tony Liu 
> > wrote:
> > >
> > > > -Original Message-
> > > > From: Numan Siddique 
> > > > Sent: Wednesday, August 12, 2020 2:17 AM
> > > > To: Tony Liu 
> > > > Cc: ovs-discuss@openvswitch.org
> > > > Subject: Re: [ovs-discuss] [OVN] not-equal in ACL
> > > >
> > > > On Mon, Aug 10, 2020 at 11:11 PM Tony Liu 
> > > > wrote:
> > > > >
> > > > > Hi Numan,
> > > > >
> > > > > Create a new thread here to follow up ACL questions.
> > > > >
> > > > > > > > I think this is a big problem here. We should not use "!="
> > > > > > > > in logical flows, although OVN allows.
> > > > > > >
> > > > > > > Is this a generic recommendation or for certain cases?
> > > > > > > Is it OK to add an ACL with "!=", like below?
> > > > > > >
> > > > > > > ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d
> > > > > > > from-lport
> > > > 1005
> > > > > > > 'ip4.dst == 192.168.0.0/16 && inport !=
> > > > > > > "d93619c3-dab9-4f6d-8261-4211f6937fd1"' drop
> > > > > >
> > > > > >
> > > > > > This is a generic recommendation. The above ACL would also
> > > > > > result in many OF flows.
> > > > > >
> > > > > > To handle cases like above, you can add a couple of ACLs like
> > > > > > below
> > > > with
> > > > > > high priority flow to allow the desired inport and low priority
> > > > > > ACL
> > > > to
> > > > > > drop all the traffic.
> > > > > >
> > > > > >  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-e780d9dfdc0d
> > > > > > from-lport
> > > > > > 1006 'ip4.dst == 192.168.0.0/16 && inport ==
> > > > > > "d93619c3-dab9-4f6d-
> > > > 8261-
> > > > > > 4211f6937fd1"' allow  ovn-nbctl acl-add 12b1681c-b3e7-4ec9-b324-
> > > > > > e780d9dfdc0d from-lport
> > > > > > 1005 'ip4.dst == 192.168.0.0/16"' drop
> > > > >
> > > > > In my case, two LS connect to one LR who has external access.
> > > > > There are 3 ports on each LS.
> > > > > * vm_port
> > > > > * gw_port (connect to LR)
> > > > > * svc_port (localport for DHCP and metadata)
> > > > >
> > > > > What I want is to disable the connection between two LS while
> > > > > allow external access for them.
> > > > >
> > > > > Option #1, create one ACL for each VM on each LS.
> > > > > 
> > > > > acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport
> > > > > ==
> > > > "$vm_port"' drop
> > > > > 
> > > > > This works fine for me, but the ACL has to be per VM.
> > > > >
> > > > > Option #2, create one ACL to exclude gw_port and svc_port.
> > > > > 
> > > > > acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16 && inport
> > > > > !=
> > > > "$gw_port" && inport != "svc_port"' drop
> > > > > 
> > > > > As you mentioned, this is not recommended, cause it will result
> > > > > many OF flows. I actually tried, but I don't see any OF flows
> > > > > created for that ACL. Is there any policy in ovn-controller to not
> > > > > translate such policy to OF flow?
> > > > >
> > > > > Option #3, as you suggested, I tried 2 ACLs.
> > > > > 
> > > > > acl-add $ls from-lport 1006 'ip4.dst == 192.168.0.0/16 && (inport
> > > > > ==
> > > > "$gw_port" || inport == "svc_port")' allow
> > > > > acl-add $ls from-lport 1005 'ip4.dst == 192.168.0.0/16' drop
> > > > >  On compute node, I see the "drop" OF flow only, not the
> > > > > "allow" flow.
> > > > > Am I missing anything here?
> > > > >
> > > >
> > > > If there is a logical flow like - "inport == port1 && .",
> > > > ovnm-controller which binds this logical port  converts like logical
> > > > flow to OF rule.
> > > > Other ovn-controller ignore this logical flow. I think that's what
> > > > happening in your case.
> > >
> > > I don't quite get it. Are you saying, ovn-controller on compute node
> > > ignores the rule because those ports are not all bound on that
> > > chassis? The gw_port and svc_port are not bound to any chassis by any
> > > ovn-controller.
> >
> >
> > I don't know what you mean by "its not bound to any chassis". Are these
> > lports part of the logical switch ? or logical router ?
>
> In NB, they are LS ports in logical_switch_port table.
> GW port is "router" port to connect to LR.
> SVC port is "localport" port to provide DHCP and metadata service.
>
> In SB, GW port is "patch" port and SVC port is "localport" port.
> Those two ports are not bound to any specific chassis.
> "chassis" column in port_binding table is empty.
> They exist on all chassis who has VM launched on that LS.
> VM port is bound to specific chassis where the VM is launched.
>
> I didn't quite get your comments. My guess is that, ovn-controller
> on a chassis ignores the rule with GW port and SVC port, because
> they are not bound to that chassis. Is that true?
>
> If that is true, I'd say it's