Thanks Aliasgar, I am still facing the same issue.
Can you also share the (ovn-ctl) commands you used to start/join the ovsdb-server clusters in your nodes ? Thanks Numan On Tue, Mar 27, 2018 at 11:04 PM, aginwala <aginw...@asu.edu> wrote: > Hu Numan: > > You need to use --db as you are now running db in cluster, you can access > data from any of the three dbs. > > So if the leader crashes, it re-elects from the other two. Below is the > e.g. command: > > # export remote="tcp:192.168.220.103:6641,tcp:192.168.220.102:6641,tcp: > 192.168.220.101:6641" > # kill -9 3985 > # ovn-nbctl --db=$remote show > switch 1d86ab4e-c8bf-4747-a716-8832a285d58c (ls1) > # ovn-nbctl --db=$remote ls-del ls1 > > > > > > > > Hope it helps! > > Regards, > > > On Tue, Mar 27, 2018 at 10:01 AM, Numan Siddique <nusid...@redhat.com> > wrote: > >> Hi Aliasgar, >> >> In your setup, if you kill the leader what is the behaviour ? Are you >> still able to create or delete any resources ? Is a new leader elected ? >> >> In my setup, the command "ovn-nbctl ls-add" for example blocks until I >> restart the ovsdb-server in node 1. And I don't see any other ovsdb-server >> becoming leader. May be I have configured wrongly. >> Could you please test this scenario if not yet please and let me know >> your observations if possible. >> >> Thanks >> Numan >> >> >> On Thu, Mar 22, 2018 at 12:28 PM, Han Zhou <zhou...@gmail.com> wrote: >> >>> Sounds good. >>> >>> Just checked the patch, by default the C IDL has "leader_only" as true, >>> which ensures that connection is to leader only. This is the case for >>> northd. So the lock works for northd hot active-standby purpose if all the >>> ovsdb endpoints of a cluster are specified to northd, since all northds are >>> connecting to the same DB, the leader. >>> >>> For neutron networking-ovn, this may not work yet, since I didn't see >>> such logic in the python IDL in current patch series. It would be good if >>> we add similar logic for python IDL. (@ben/numan, correct me if I am wrong) >>> >>> >>> On Wed, Mar 21, 2018 at 6:49 PM, aginwala <aginw...@asu.edu> wrote: >>> >>>> Hi : >>>> >>>> Just sorted out the correct settings and northd also works in ha in >>>> raft. >>>> >>>> There were 2 issues in the setup: >>>> 1. I had started nb db without --db-nb-create-insecure-remote >>>> 2. I also started northd locally on all 3 without remote which is like >>>> all three northd trying to lock the ovsdb locally. >>>> >>>> Hence, the duplicate logs were populated in the southbound datapath due >>>> to multiple northd trying to write the local copy. >>>> >>>> So, I now start nb db with --db-nb-create-insecure-remote and northd on >>>> all 3 nodes using below command: >>>> >>>> ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db="tcp: >>>> 10.169.125.152:6641,tcp:10.169.125.131:6641,tcp:10.148.181.162:6641" >>>> --ovnsb-db="tcp:10.169.125.152:6642,tcp:10.169.125.131:6642,tcp: >>>> 10.148.181.162:6642" --no-chdir >>>> --log-file=/var/log/openvswitch/ovn-northd.log >>>> --pidfile=/var/run/openvswitch/ovn-northd.pid --detach --monitor >>>> >>>> >>>> #At start, northd went active on the leader node and standby on other >>>> two nodes. >>>> >>>> #After old leader crashed and new leader got elected, northd goes >>>> active on any of the remaining 2 nodes as per sample logs below from >>>> non-leader node: >>>> 2018-03-22T00:20:30.732Z|00023|ovn_northd|INFO|ovn-northd lock lost. >>>> This ovn-northd instance is now on standby. >>>> 2018-03-22T00:20:30.743Z|00024|ovn_northd|INFO|ovn-northd lock >>>> acquired. This ovn-northd instance is now active. >>>> >>>> # Also ovn-controller works similar way if leader goes down and >>>> connects to any of the remaining 2 nodes: >>>> 2018-03-22T01:21:56.250Z|00029|ovsdb_idl|INFO|tcp:10.148.181.162:6642: >>>> clustered database server is disconnected from cluster; trying another >>>> server >>>> 2018-03-22T01:21:56.250Z|00030|reconnect|INFO|tcp:10.148.181.162:6642: >>>> connection attempt timed out >>>> 2018-03-22T01:21:56.250Z|00031|reconnect|INFO|tcp:10.148.181.162:6642: >>>> waiting 4 seconds before reconnect >>>> 2018-03-22T01:23:52.417Z|00043|reconnect|INFO|tcp:10.148.181.162:6642: >>>> connected >>>> >>>> >>>> >>>> Above settings will also work if we put all the nodes behind the vip >>>> and updates the ovn configs to use vips. So we don't need pacemaker >>>> explicitly for northd HA :). >>>> >>>> Since the setup is complete now, I will populate the same in scale test >>>> env and see how it behaves. >>>> >>>> @Numan: We can try the same with networking-ovn integration and see if >>>> we find anything weird there too. Not sure if you have any exclusive >>>> findings for this case. >>>> >>>> Let me know if something else is missed here. >>>> >>>> >>>> >>>> >>>> Regards, >>>> >>>> On Wed, Mar 21, 2018 at 2:50 PM, Han Zhou <zhou...@gmail.com> wrote: >>>> >>>>> Ali, sorry if I misunderstand what you are saying, but pacemaker here >>>>> is for northd HA. pacemaker itself won't point to any ovsdb cluster node. >>>>> All northds can point to a LB VIP for the ovsdb cluster, so if a member of >>>>> ovsdb cluster is down it won't have impact to northd. >>>>> >>>>> Without clustering support of the ovsdb lock, I think this is what we >>>>> have now for northd HA. Please suggest if anyone has any other idea. >>>>> Thanks >>>>> :) >>>>> >>>>> On Wed, Mar 21, 2018 at 1:12 PM, aginwala <aginw...@asu.edu> wrote: >>>>> >>>>>> :) The only thing is while using pacemaker, if the node that >>>>>> pacemaker if pointing to is down, all the active/standby northd nodes >>>>>> have >>>>>> to be updated to new node from the cluster. But will dig in more to see >>>>>> what else I can find. >>>>>> >>>>>> @Ben: Any suggestions further? >>>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> On Wed, Mar 21, 2018 at 10:22 AM, Han Zhou <zhou...@gmail.com> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 21, 2018 at 9:49 AM, aginwala <aginw...@asu.edu> wrote: >>>>>>> >>>>>>>> Thanks Numan: >>>>>>>> >>>>>>>> Yup agree with the locking part. For now; yes I am running northd >>>>>>>> on one node. I might right a script to monitor northd in cluster so >>>>>>>> that >>>>>>>> if the node where it's running goes down, script can spin up northd on >>>>>>>> one >>>>>>>> other active nodes as a dirty hack. >>>>>>>> >>>>>>>> The "dirty hack" is pacemaker :) >>>>>>> >>>>>>> >>>>>>>> Sure, will await for the inputs from Ben too on this and see how >>>>>>>> complex would it be to roll out this feature. >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Mar 21, 2018 at 5:43 AM, Numan Siddique < >>>>>>>> nusid...@redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Aliasgar, >>>>>>>>> >>>>>>>>> ovsdb-server maintains locks per each connection and not across >>>>>>>>> the db. A workaround for you now would be to configure all the >>>>>>>>> ovn-northd >>>>>>>>> instances to connect to one ovsdb-server if you want to have >>>>>>>>> active/standy. >>>>>>>>> >>>>>>>>> Probably Ben can answer if there is a plan to support ovsdb locks >>>>>>>>> across the db. We also need this support in networking-ovn as it also >>>>>>>>> uses >>>>>>>>> ovsdb locks. >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Numan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Mar 21, 2018 at 1:40 PM, aginwala <aginw...@asu.edu> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Numan: >>>>>>>>>> >>>>>>>>>> Just figured out that ovn-northd is running as active on all 3 >>>>>>>>>> nodes instead of one active instance as I continued to test further >>>>>>>>>> which >>>>>>>>>> results in db errors as per logs. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # on node 3, I run ovn-nbctl ls-add ls2 ; it populates below logs >>>>>>>>>> in ovn-north >>>>>>>>>> 2018-03-21T06:01:59.442Z|00007|ovsdb_idl|WARN|transaction error: >>>>>>>>>> {"details":"Transaction causes multiple rows in \"Datapath_Binding\" >>>>>>>>>> table >>>>>>>>>> to have identical values (1) for index on column \"tunnel_key\". >>>>>>>>>> First >>>>>>>>>> row, with UUID 8c5d9342-2b90-4229-8ea1-001a733a915c, was >>>>>>>>>> inserted by this transaction. Second row, with UUID >>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683, existed in the database >>>>>>>>>> before this transaction and was not modified by the >>>>>>>>>> transaction.","error":"constraint violation"} >>>>>>>>>> >>>>>>>>>> In southbound datapath list, 2 duplicate records gets created for >>>>>>>>>> same switch. >>>>>>>>>> >>>>>>>>>> # ovn-sbctl list Datapath >>>>>>>>>> _uuid : b270ae30-3458-445f-95d2-b14e8ebddd01 >>>>>>>>>> external_ids : >>>>>>>>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>>>>>>>>> name="ls2"} >>>>>>>>>> tunnel_key : 2 >>>>>>>>>> >>>>>>>>>> _uuid : 8e06f919-4cc7-4ffc-9a79-20ce6663b683 >>>>>>>>>> external_ids : >>>>>>>>>> {logical-switch="4d6674e3-ff9f-4f38-b050-0fa9bec9e34d", >>>>>>>>>> name="ls2"} >>>>>>>>>> tunnel_key : 1 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> # on nodes 1 and 2 where northd is running, it gives below error: >>>>>>>>>> 2018-03-21T06:01:59.437Z|00008|ovsdb_idl|WARN|transaction error: >>>>>>>>>> {"details":"cannot delete Datapath_Binding row >>>>>>>>>> 8e06f919-4cc7-4ffc-9a79-20ce6663b683 because of 17 remaining >>>>>>>>>> reference(s)","error":"referential integrity violation"} >>>>>>>>>> >>>>>>>>>> As per commit message, for northd I re-tried setting >>>>>>>>>> --ovnnb-db="tcp:10.169.125.152:6641,tcp:10.169.125.131:6641,tcp: >>>>>>>>>> 10.148.181.162:6641" and --ovnsb-db="tcp:10.169.125.152:6642 >>>>>>>>>> ,tcp:10.169.125.131:6642,tcp:10.148.181.162:6642" and it did not >>>>>>>>>> help either. >>>>>>>>>> >>>>>>>>>> There is no issue if I keep running only one instance of northd >>>>>>>>>> on any of these 3 nodes. Hence, wanted to know is there >>>>>>>>>> something else missing here to make only one northd instance as >>>>>>>>>> active and >>>>>>>>>> rest as standby? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> On Thu, Mar 15, 2018 at 3:09 AM, Numan Siddique < >>>>>>>>>> nusid...@redhat.com> wrote: >>>>>>>>>> >>>>>>>>>>> That's great >>>>>>>>>>> >>>>>>>>>>> Numan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 15, 2018 at 2:57 AM, aginwala <aginw...@asu.edu> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Numan: >>>>>>>>>>>> >>>>>>>>>>>> I tried on new nodes (kernel : 4.4.0-104-generic , Ubuntu >>>>>>>>>>>> 16.04)with fresh installation and it worked super fine for both >>>>>>>>>>>> sb and nb dbs. Seems like some kernel issue on the previous >>>>>>>>>>>> nodes when I re-installed raft patch as I was running different >>>>>>>>>>>> ovs version >>>>>>>>>>>> on those nodes before. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> For 2 HVs, I now set ovn-remote="tcp:10.169.125.152:6642, tcp: >>>>>>>>>>>> 10.169.125.131:6642, tcp:10.148.181.162:6642" and started >>>>>>>>>>>> controller and it works super fine. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Did some failover testing by rebooting/killing the leader ( >>>>>>>>>>>> 10.169.125.152) and bringing it back up and it works as >>>>>>>>>>>> expected. Nothing weird noted so far. >>>>>>>>>>>> >>>>>>>>>>>> # check-cluster gives below data one of the node( >>>>>>>>>>>> 10.148.181.162) post leader failure >>>>>>>>>>>> >>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>> ovsdb-tool: leader /etc/openvswitch/ovnsb_db.db for term 2 has >>>>>>>>>>>> log entries only up to index 18446744073709551615, but index 9 was >>>>>>>>>>>> committed in a previous term (e.g. by /etc/openvswitch/ovnsb_db.db) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> For check-cluster, are we planning to add more output showing >>>>>>>>>>>> which node is active(leader), etc in upcoming versions ? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks a ton for helping sort this out. I think the patch >>>>>>>>>>>> looks good to be merged post addressing of the comments by Justin >>>>>>>>>>>> along >>>>>>>>>>>> with the man page details for ovsdb-tool. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I will do some more crash testing for the cluster along with >>>>>>>>>>>> the scale test and keep you posted if something unexpected is >>>>>>>>>>>> noted. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 13, 2018 at 11:07 PM, Numan Siddique < >>>>>>>>>>>> nusid...@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Mar 14, 2018 at 7:51 AM, aginwala <aginw...@asu.edu> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Sure. >>>>>>>>>>>>>> >>>>>>>>>>>>>> To add on , I also ran for nb db too using different port and >>>>>>>>>>>>>> Node2 crashes with same error : >>>>>>>>>>>>>> # Node 2 >>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>>> --db-nb-addr=10.99.152.138 --db-nb-port=6641 >>>>>>>>>>>>>> --db-nb-cluster-remote-addr="t >>>>>>>>>>>>>> cp:10.99.152.148:6645" --db-nb-cluster-local-addr="tcp: >>>>>>>>>>>>>> 10.99.152.138:6645" start_nb_ovsdb >>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnnb_db.db: >>>>>>>>>>>>>> cannot identify file type >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> Hi Aliasgar, >>>>>>>>>>>>> >>>>>>>>>>>>> It worked for me. Can you delete the old db files in >>>>>>>>>>>>> /etc/openvswitch/ and try running the commands again ? >>>>>>>>>>>>> >>>>>>>>>>>>> Below are the commands I ran in my setup. >>>>>>>>>>>>> >>>>>>>>>>>>> Node 1 >>>>>>>>>>>>> ------- >>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>> --db-sb-addr=192.168.121.91 --db-sb-port=6642 >>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>> --db-sb-cluster-local-addr=tcp:192.168.121.91:6644 >>>>>>>>>>>>> start_sb_ovsdb >>>>>>>>>>>>> >>>>>>>>>>>>> Node 2 >>>>>>>>>>>>> --------- >>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>> --db-sb-addr=192.168.121.87 --db-sb-port=6642 >>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.87:6644" >>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" >>>>>>>>>>>>> start_sb_ovsdb >>>>>>>>>>>>> >>>>>>>>>>>>> Node 3 >>>>>>>>>>>>> --------- >>>>>>>>>>>>> sudo /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>> --db-sb-addr=192.168.121.78 --db-sb-port=6642 >>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:192.168.121.78:6644" >>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:192.168.121.91:6644" >>>>>>>>>>>>> start_sb_ovsdb >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> Numan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:40 AM, Numan Siddique < >>>>>>>>>>>>>> nusid...@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 9:46 PM, aginwala <aginw...@asu.edu> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks Numan for the response. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There is no command start_cluster_sb_ovsdb in the source >>>>>>>>>>>>>>>> code too. Is that in a separate commit somewhere? Hence, I >>>>>>>>>>>>>>>> used start_sb_ovsdb >>>>>>>>>>>>>>>> which I think would not be a right choice? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry, I meant start_sb_ovsdb. Strange that it didn't work >>>>>>>>>>>>>>> for you. Let me try it out again and update this thread. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> Numan >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # Node1 came up as expected. >>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>> start_sb_ovsdb. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # verifying its a clustered db with ovsdb-tool >>>>>>>>>>>>>>>> db-local-address /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>> tcp:10.99.152.148:6644 >>>>>>>>>>>>>>>> # ovn-sbctl show works fine and chassis are being populated >>>>>>>>>>>>>>>> correctly. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> #Node 2 fails with error: >>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 >>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" >>>>>>>>>>>>>>>> start_sb_ovsdb >>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: >>>>>>>>>>>>>>>> cannot identify file type >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # So i did start the sb db the usual way using start_ovsdb >>>>>>>>>>>>>>>> to just get the db file created and killed the sb pid and >>>>>>>>>>>>>>>> re-ran the >>>>>>>>>>>>>>>> command which gave actual error where it complains for >>>>>>>>>>>>>>>> join-cluster command >>>>>>>>>>>>>>>> that is being called internally >>>>>>>>>>>>>>>> /usr/share/openvswitch/scripts/ovn-ctl >>>>>>>>>>>>>>>> --db-sb-addr=10.99.152.138 --db-sb-port=6642 >>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" >>>>>>>>>>>>>>>> start_sb_ovsdb >>>>>>>>>>>>>>>> ovsdb-tool: /etc/openvswitch/ovnsb_db.db: not a clustered >>>>>>>>>>>>>>>> database >>>>>>>>>>>>>>>> * Backing up database to /etc/openvswitch/ovnsb_db.db.b >>>>>>>>>>>>>>>> ackup1.15.0-70426956 >>>>>>>>>>>>>>>> ovsdb-tool: 'join-cluster' command requires at least 4 >>>>>>>>>>>>>>>> arguments >>>>>>>>>>>>>>>> * Creating cluster database /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>> from existing one >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # based on above error I killed the sb db pid again and >>>>>>>>>>>>>>>> try to create a local cluster on node then re-ran the join >>>>>>>>>>>>>>>> operation as >>>>>>>>>>>>>>>> per the source code function. >>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6644 tcp: >>>>>>>>>>>>>>>> 10.99.152.148:6644 which still complains >>>>>>>>>>>>>>>> ovsdb-tool: I/O error: /etc/openvswitch/ovnsb_db.db: create >>>>>>>>>>>>>>>> failed (File exists) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> # Node 3: I did not try as I am assuming the same failure >>>>>>>>>>>>>>>> as node 2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let me know may know further. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 3:08 AM, Numan Siddique < >>>>>>>>>>>>>>>> nusid...@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Aliasgar, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Mar 13, 2018 at 7:11 AM, aginwala < >>>>>>>>>>>>>>>>> aginw...@asu.edu> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Ben/Noman: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I am trying to setup 3 node southbound db cluster using >>>>>>>>>>>>>>>>>> raft10 <https://patchwork.ozlabs.org/patch/854298/> in >>>>>>>>>>>>>>>>>> review. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # Node 1 create-cluster >>>>>>>>>>>>>>>>>> ovsdb-tool create-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>>>> /root/ovs-reviews/ovn/ovn-sb.ovsschema tcp: >>>>>>>>>>>>>>>>>> 10.99.152.148:6642 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> A different port is used for RAFT. So you have to choose >>>>>>>>>>>>>>>>> another port like 6644 for example. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # Node 2 >>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.138:6642 tcp: >>>>>>>>>>>>>>>>>> 10.99.152.148:6642 --cid 5dfcb678-bb1d-4377-b02d-a380ed >>>>>>>>>>>>>>>>>> ec2982 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> #Node 3 >>>>>>>>>>>>>>>>>> ovsdb-tool join-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>>>> OVN_Southbound tcp:10.99.152.101:6642 tcp: >>>>>>>>>>>>>>>>>> 10.99.152.138:6642 tcp:10.99.152.148:6642 --cid >>>>>>>>>>>>>>>>>> 5dfcb678-bb1d-4377-b02d-a380edec2982 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # ovn remote is set to all 3 nodes >>>>>>>>>>>>>>>>>> external_ids:ovn-remote="tcp:10.99.152.148:6642, tcp: >>>>>>>>>>>>>>>>>> 10.99.152.138:6642, tcp:10.99.152.101:6642" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # Starting sb db on node 1 using below command on node 1: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ovsdb-server --detach --monitor -vconsole:off -vraft >>>>>>>>>>>>>>>>>> -vjsonrpc --log-file=/var/log/openvswitch/ovsdb-server-sb.log >>>>>>>>>>>>>>>>>> --pidfile=/var/run/openvswitch/ovnsb_db.pid >>>>>>>>>>>>>>>>>> --remote=db:OVN_Southbound,SB_Global,connections >>>>>>>>>>>>>>>>>> --unixctl=ovnsb_db.ctl >>>>>>>>>>>>>>>>>> --private-key=db:OVN_Southbound,SSL,private_key >>>>>>>>>>>>>>>>>> --certificate=db:OVN_Southbound,SSL,certificate >>>>>>>>>>>>>>>>>> --ca-cert=db:OVN_Southbound,SSL,ca_cert >>>>>>>>>>>>>>>>>> --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols >>>>>>>>>>>>>>>>>> --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers >>>>>>>>>>>>>>>>>> --remote=punix:/var/run/openvswitch/ovnsb_db.sock >>>>>>>>>>>>>>>>>> /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # check-cluster is returning nothing >>>>>>>>>>>>>>>>>> ovsdb-tool check-cluster /etc/openvswitch/ovnsb_db.db >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> # ovsdb-server-sb.log below shows the leader is elected >>>>>>>>>>>>>>>>>> with only one server and there are rbac related debug logs >>>>>>>>>>>>>>>>>> with rpc replies >>>>>>>>>>>>>>>>>> and empty params with no errors >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00002|raft|DBG|server 63d1 added to >>>>>>>>>>>>>>>>>> configuration >>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00003|raft|INFO|term 6: starting >>>>>>>>>>>>>>>>>> election >>>>>>>>>>>>>>>>>> 2018-03-13T01:12:02Z|00004|raft|INFO|term 6: elected >>>>>>>>>>>>>>>>>> leader by 1+ of 1 servers >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Now Starting the ovsdb-server on the other clusters fails >>>>>>>>>>>>>>>>>> saying >>>>>>>>>>>>>>>>>> ovsdb-server: ovsdb error: /etc/openvswitch/ovnsb_db.db: >>>>>>>>>>>>>>>>>> cannot identify file type >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also noticed that man ovsdb-tool is missing cluster >>>>>>>>>>>>>>>>>> details. Might want to address it in the same patch or >>>>>>>>>>>>>>>>>> different. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Please advise to what is missing here for running >>>>>>>>>>>>>>>>>> ovn-sbctl show as this command hangs. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think you can use the ovn-ctl command >>>>>>>>>>>>>>>>> "start_cluster_sb_ovsdb" for your testing (atleast for now) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For your setup, I think you can start the cluster as >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Node 1 >>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.148 --db-sb-port=6642 >>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Node 2 >>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.138 --db-sb-port=6642 >>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.138:6644" >>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> # Node 3 >>>>>>>>>>>>>>>>> ovn-ctl --db-sb-addr=10.99.152.101 --db-sb-port=6642 >>>>>>>>>>>>>>>>> --db-sb-create-insecure-remote=yes >>>>>>>>>>>>>>>>> --db-sb-cluster-local-addr="tcp:10.99.152.101:6644" >>>>>>>>>>>>>>>>> --db-sb-cluster-remote-addr="tcp:10.99.152.148:6644" >>>>>>>>>>>>>>>>> start_cluster_sb_ovsdb >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Let me know how it goes. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>> Numan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> discuss mailing list >>>>>>>>>>>>>>>>>> disc...@openvswitch.org >>>>>>>>>>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> discuss mailing list >>>>>>>> disc...@openvswitch.org >>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss