Re: [ovs-discuss] Error installation

2020-08-03 Thread Gregory Rose
You must be using an old version of openvswitch.  Please update to the 
latest release.


- Greg

On 7/29/2020 6:21 PM, JORGE ELIECER GOMEZ GOMEZ wrote:

Hello

I have a problem with installation of Openvswitch  in Welcome to Ubuntu
20.04.1 LTS (GNU/Linux 5.4.0-26-generic x86_64)

error: Linux kernel in /lib/modules/5.4.0-26-generic/build is version
5.4.30, but version newer than 4.3.x is not supported (please refer to the
FAQ for advice) 

What do I have to do with this error?

Thank

Jorge


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] [OVN] constraint violation error from Neutron OVN ML2 driver

2020-08-03 Thread Tony Liu
Hi,

Any clues about this error? It is reproduceable but not consistently.
There are 3 Neutron nodes and 3 OVN DB nodes (RAFT cluster).
It happened when connecting network to router by OpenStack cli.

===
2020-08-03 12:30:17.054 22 ERROR ovsdbapp.backend.ovs_idl.transaction 
[req-acf33a39-f8b5-4b9f-91d7-0100f1e7c189 - - - - -] OVSDB Error: 
{"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table 
to have identical values (\"6ee12e8e-f8e5-46e2-84b7-58b9dc7d9253\") for index 
on column \"name\".  First row, with UUID b5c36d61-ad55-45bf-90f6-70f6649251c3, 
existed in the database before this transaction and was not modified by the 
transaction.  Second row, with UUID 5498876d-55f6-4364-8db7-f6d591dc9ba9, was 
inserted by this transaction.","error":"constraint violation"}
2020-08-03 12:30:17.055 22 ERROR ovsdbapp.backend.ovs_idl.transaction 
[req-8b5c1d93-ba88-4259-ab7e-08ee6c57a884 fb4212bf04404c15a19208ca920c1b1a 
3e9209736c7146bead16e02b0679f3a1 - default default] Traceback (most recent call 
last):
  File 
"/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 
122, in run
txn.results.put(txn.do_commit())
  File 
"/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", 
line 118, in do_commit
raise RuntimeError(msg)
RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in 
\"Logical_Switch_Port\" table to have identical values 
(\"6ee12e8e-f8e5-46e2-84b7-58b9dc7d9253\") for index on column \"name\".  First 
row, with UUID b5c36d61-ad55-45bf-90f6-70f6649251c3, existed in the database 
before this transaction and was not modified by the transaction.  Second row, 
with UUID 5498876d-55f6-4364-8db7-f6d591dc9ba9, was inserted by this 
transaction.","error":"constraint violation"}

2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers 
[req-8b5c1d93-ba88-4259-ab7e-08ee6c57a884 fb4212bf04404c15a19208ca920c1b1a 
3e9209736c7146bead16e02b0679f3a1 - default default] Mechanism driver 'ovn' 
failed in create_port_postcommit: RuntimeError: OVSDB Error: 
{"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table 
to have identical values (\"6ee12e8e-f8e5-46e2-84b7-58b9dc7d9253\") for index 
on column \"name\".  First row, with UUID b5c36d61-ad55-45bf-90f6-70f6649251c3, 
existed in the database before this transaction and was not modified by the 
transaction.  Second row, with UUID 5498876d-55f6-4364-8db7-f6d591dc9ba9, was 
inserted by this transaction.","error":"constraint violation"}
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers Traceback (most 
recent call last):
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/neutron/plugins/ml2/managers.py", line 477, 
in _call_on_drivers
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers 
getattr(driver.obj, method_name)(context)
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py",
 line 544, in create_port_postcommit
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers 
self._ovn_client.create_port(context._plugin_context, port)
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py",
 line 437, in create_port
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers 
self._qos_driver.create_port(txn, port)
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers next(self.gen)
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py",
 line 184, in transaction
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers yield t
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers next(self.gen)
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 119, in transaction
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers del 
self._nested_txns_map[cur_thread_id]
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/ovsdbapp/api.py", line 69, in __exit__
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers self.result = 
self.commit()
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers   File 
"/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", 
line 62, in commit
2020-08-03 12:30:17.056 22 ERROR neutron.plugins.ml2.managers raise 
res

[ovs-discuss] Double free in recent kernels after memleak fix

2020-08-03 Thread Johan Knöös via discuss
Hi Open vSwitch contributors,

We have found openvswitch is causing double-freeing of memory. The
issue was not present in kernel version 5.5.17 but is present in
5.6.14 and newer kernels.

After reverting the RCU commits below for debugging, enabling
slub_debug, lockdep, and KASAN, we see the warnings at the end of this
email in the kernel log (the last one shows the double-free). When I
revert 50b0e61b32ee890a75b4377d5fbe770a86d6a4c1 ("net: openvswitch:
fix possible memleak on destroy flow-table"), the symptoms disappear.
While I have a reliable way to reproduce the issue, I unfortunately
don't yet have a process that's amenable to sharing. Please take a
look.

189a6883dcf7 rcu: Remove kfree_call_rcu_nobatch()
77a40f97030b rcu: Remove kfree_rcu() special casing and lazy-callback handling
e99637becb2e rcu: Add support for debug_objects debugging for kfree_rcu()
0392bebebf26 rcu: Add multiple in-flight batches of kfree_rcu() work
569d767087ef rcu: Make kfree_rcu() use a non-atomic ->monitor_todo
a35d16905efc rcu: Add basic support for kfree_rcu() batching

Thanks,
Johan Knöös

Traces:

[ cut here ]
WARNING: CPU: 30 PID: 0 at net/openvswitch/flow_table.c:272
table_instance_flow_free+0x2fd/0x340 [openvswitch]
Modules linked in: ...
CPU: 30 PID: 0 Comm: swapper/30 Tainted: GE 5.6.14+ #18
Hardware name: ...
RIP: 0010:table_instance_flow_free+0x2fd/0x340 [openvswitch]
Code: c1 fa 1f 48 c1 e8 20 29 d0 41 39 c7 0f 8f 95 fe ff ff 48 83 c4
10 48 89 ef d1 fe 5b 5d 41 5c 41 5d 41 5e 41 5f e9 33 fb ff ff <0f> 0b
e9 59 fe ff ff 0f 0b e8 65 f1 fe ff 85 c0 0f 85 9b fe ff ff
RSP: 0018:888c3e589da8 EFLAGS: 00010246
RAX:  RBX: 889f954ee580 RCX: dc00
RDX: 0007 RSI: 0003 RDI: 0246
RBP: 888c295150a0 R08: 9297f341 R09: 
R10:  R11:  R12: 889f1ed55000
R13: 888b72efa020 R14: 888c24209480 R15: 888b731bb6f8
FS:  () GS:888c3e58() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0733feb8a700 CR3: 000ba726e004 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:

table_instance_destroy+0xf9/0x1b0 [openvswitch]
? new_vport+0xb0/0xb0 [openvswitch]
destroy_dp_rcu+0x12/0x50 [openvswitch]
rcu_core+0x34d/0x9b0
? rcu_all_qs+0x90/0x90
? rcu_read_lock_sched_held+0xa5/0xc0
? rcu_read_lock_bh_held+0xc0/0xc0
? run_rebalance_domains+0x11d/0x140
__do_softirq+0x128/0x55c
irq_exit+0x101/0x110
smp_apic_timer_interrupt+0xfd/0x2f0
apic_timer_interrupt+0xf/0x20

RIP: 0010:cpuidle_enter_state+0xda/0x5d0
Code: 80 7c 24 10 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 04
00 00 31 ff e8 c2 1a 7a ff e8 9d 4d 84 ff fb 66 0f 1f 44 00 00 <45> 85
ed 0f 88 b6 03 00 00 4d 63 f5 4b 8d 04 76 4e 8d 3c f5 00 00
RSP: 0018:888103f07d58 EFLAGS: 0246 ORIG_RAX: ff13
RAX:  RBX: 888c3e5c1800 RCX: dc00
RDX: 0007 RSI: 0006 RDI: 888103ec88d4
RBP: 945a3940 R08: 92982042 R09: 
R10:  R11:  R12: 0002
R13: 0002 R14: 00d0 R15: 945a3a10
? lockdep_hardirqs_on+0x182/0x260
? cpuidle_enter_state+0xd3/0x5d0
cpuidle_enter+0x3c/0x60
do_idle+0x36a/0x450
? arch_cpu_idle_exit+0x40/0x40
cpu_startup_entry+0x19/0x20
start_secondary+0x21f/0x290
? set_cpu_sibling_map+0xcb0/0xcb0
secondary_startup_64+0xa4/0xb0
irq event stamp: 1626911
hardirqs last  enabled at (1626910): [] __call_rcu+0x1b7/0x3b0
hardirqs last disabled at (1626911): []
trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last  enabled at (1626882): [] irq_enter+0x75/0x80
softirqs last disabled at (1626883): [] irq_exit+0x101/0x110
---[ end trace 8dc48dec48bb79c0 ]---


---


=
WARNING: suspicious RCU usage
5.6.14+ #18 Tainted: GW   E
-
net/openvswitch/flow_table.c:239 suspicious rcu_dereference_protected() usage!
\x0aother info that might help us debug this:\x0a
\x0arcu_scheduler_active = 2, debug_locks = 1
1 lock held by swapper/30/0:
#0: 94315e00 (rcu_callback){}, at: rcu_core+0x395/0x9b0
\x0astack backtrace:
CPU: 30 PID: 0 Comm: swapper/30 Tainted: GW   E 5.6.14+ #18
Hardware name: ...
Call Trace:

dump_stack+0xb8/0x110
table_instance_flow_free+0x332/0x340 [openvswitch]
table_instance_destroy+0xf9/0x1b0 [openvswitch]
? new_vport+0xb0/0xb0 [openvswitch]
destroy_dp_rcu+0x12/0x50 [openvswitch]
rcu_core+0x34d/0x9b0
? rcu_all_qs+0x90/0x90
? rcu_read_lock_sched_held+0xa5/0xc0
? rcu_read_lock_bh_held+0xc0/0xc0
? run_rebalance_domains+0x11d/0x140
__do_softirq+0x128/0x55c
irq_exit+0x101/0x110
smp_apic_timer_interrupt+0xfd/0x2f0
apic_timer_i

[ovs-discuss] [OVN] no response to inactivity probe

2020-08-03 Thread Tony Liu
Hi,

Neutron OVN ML2 driver was disconnected by ovn-nb-db. There are many error
messages from ovn-nb-db leader.

2020-08-04T02:31:39.751Z|03138|reconnect|ERR|tcp:10.6.20.81:58620: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:42.484Z|03139|reconnect|ERR|tcp:10.6.20.81:58300: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:49.858Z|03140|reconnect|ERR|tcp:10.6.20.81:59582: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:53.057Z|03141|reconnect|ERR|tcp:10.6.20.83:42626: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:53.058Z|03142|reconnect|ERR|tcp:10.6.20.82:45412: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:54.067Z|03143|reconnect|ERR|tcp:10.6.20.81:59416: no response 
to inactivity probe after 5 seconds, disconnecting
2020-08-04T02:31:54.809Z|03144|reconnect|ERR|tcp:10.6.20.81:60004: no response 
to inactivity probe after 5 seconds, disconnecting


Could anyone share a bit details how this inactivity probe works?
>From OVN ML2 driver log, I see it connected to the leader, then the connection
was closed by leader after 5 or 6 seconds. Is this probe one-way or two-ways?
Both sides are not busy, not taking much CPU cycles. Not sure how this could
happen. Any thoughts?


Thanks!

Tony



___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Tony Liu
Health check (5 sec internal) taking 30%-100% CPU is definitely not acceptable,
if that's really the case. There must be some blocking (and not yielding CPU)
in coding, which is not supposed to be there.

Could you point me to the coding for such health check?
Is it single thread? Does it use any event library?


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Saturday, August 1, 2020 9:11 PM
> To: Tony Liu 
> Cc: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> 
> 
> On Fri, Jul 31, 2020 at 4:14 PM Tony Liu   > wrote:
> 
> 
>   Hi,
> 
>   I see the active ovn-northd takes much CPU (30% - 100%) when there
> is no
>   configuration from OpenStack, nothing happening on all chassis
> nodes either.
> 
>   Is this expected? What is it busy with?
> 
> 
> 
> 
> Yes, this is expected. It is due to the OVSDB probe between ovn-northd
> and NB/SB OVSDB servers, which is used to detect the OVSDB connection
> failure.
> Usually this is not a concern (unlike the probe with a large number of
> ovn-controller clients), because ovn-northd is a centralized component
> and the CPU cost when there is no configuration change doesn't matter
> that much. However, if it is a concern, the probe interval (default 5
> sec) can be changed.
> If you change, remember to change on both server side and client side.
> For client side (ovn-northd), it is configured in the NB DB's NB_Global
> table's options:northd_probe_interval. See man page of ovn-nb(5).
> For server side (NB and SB), it is configured in the NB and SB DB's
> Connection table's inactivity_probe column.
> 
> Thanks,
> Han
> 
> 
> 
>   
>   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 8 (10.6.20.84:44358  <->10.6.20.84:6641
>  ) at lib/stream-fd.c:157 (68% CPU usage)
>   2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
>  : received request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
>  : send reply, result=[], id="echo"
>   2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 9 (10.6.20.84:49158  <->10.6.20.85:6642
>  ) at lib/stream-fd.c:157 (34% CPU usage)
>   2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
>  : idle 5002 ms, sending inactivity probe
>   2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
>  : entering IDLE
>   2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
>  : send request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
>  : received request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
>  : entering ACTIVE
>   2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
>  : send reply, result=[], id="echo"
>   2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 9 (10.6.20.84:49158  <->10.6.20.85:6642
>  ) at lib/stream-fd.c:157 (34% CPU usage)
>   2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642
>  : received reply, result=[], id="echo"
>   2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in
> last 5 seconds (most recently, 0 seconds ago) due to excessive rate
>   2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets
> with 6+ nodes, including 2 buckets with 6 nodes (32 nodes total across
> 32 buckets)
>   2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms
> timeout at lib/reconnect.c:643 (34% CPU usage)
>   2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.6.20.84:6641
>  : idle 5001 ms, sending inactivity probe
>   2020-07-31T23:08:14.513Z|04283|reconnect|DBG|tcp:10.6.20.84:6641
>  : entering IDLE
>   2020-07-31T23:08:14.513Z|04284|jsonrpc|DBG|tcp:10.6.20.84:6641
>  : send request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:15.370Z|04285|poll_loop|DBG|wakeup due to [POLLIN]
> on fd 8 (10.6.20.84:44358  <->10.6.20.84:6641
>  ) at lib/stream-fd.c:157 (34% CPU usage)
>   2020-07-31T23:08:15.370Z|04286|jsonrpc|DBG|tcp:10.6.20.84:6641
>  : received request, method="echo", params=[],
> id="echo"
>   2020-07-31T23:08:15.370Z|04287|reconnect|DBG|tcp:10.6.20.84:6641
>  : entering ACTIVE
>

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Han Zhou
Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
take much CPU, but the probe awakes ovn-northd main loop, which recompute
everything, which is why you see CPU spike.
It will be solved by incremental-processing, when only delta is processed,
and in case of probe handling, there is no change in configuration, so the
delta is zero.
For now, please follow the steps to adjust probe interval, if the CPU of
ovn-northd (when there is no configuration change) is a concern for you.
But please remember that this has no impact to the real CPU usage for
handling configuration changes.

Thanks,
Han

On Mon, Aug 3, 2020 at 8:11 PM Tony Liu  wrote:

> Health check (5 sec internal) taking 30%-100% CPU is definitely not
> acceptable,
> if that's really the case. There must be some blocking (and not yielding
> CPU)
> in coding, which is not supposed to be there.
>
> Could you point me to the coding for such health check?
> Is it single thread? Does it use any event library?
>
>
> Thanks!
>
> Tony
>
> > -Original Message-
> > From: Han Zhou 
> > Sent: Saturday, August 1, 2020 9:11 PM
> > To: Tony Liu 
> > Cc: ovs-discuss ; ovs-dev  > d...@openvswitch.org>
> > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> > configuration update
> >
> >
> >
> > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  >  > wrote:
> >
> >
> >   Hi,
> >
> >   I see the active ovn-northd takes much CPU (30% - 100%) when there
> > is no
> >   configuration from OpenStack, nothing happening on all chassis
> > nodes either.
> >
> >   Is this expected? What is it busy with?
> >
> >
> >
> >
> > Yes, this is expected. It is due to the OVSDB probe between ovn-northd
> > and NB/SB OVSDB servers, which is used to detect the OVSDB connection
> > failure.
> > Usually this is not a concern (unlike the probe with a large number of
> > ovn-controller clients), because ovn-northd is a centralized component
> > and the CPU cost when there is no configuration change doesn't matter
> > that much. However, if it is a concern, the probe interval (default 5
> > sec) can be changed.
> > If you change, remember to change on both server side and client side.
> > For client side (ovn-northd), it is configured in the NB DB's NB_Global
> > table's options:northd_probe_interval. See man page of ovn-nb(5).
> > For server side (NB and SB), it is configured in the NB and SB DB's
> > Connection table's inactivity_probe column.
> >
> > Thanks,
> > Han
> >
> >
> >
> >   
> >   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 8 (10.6.20.84:44358  <->10.6.20.84:6641
> >  ) at lib/stream-fd.c:157 (68% CPU usage)
> >   2020-07-31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> >  : received request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> >  : send reply, result=[], id="echo"
> >   2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158  <->10.6.20.85:6642
> >  ) at lib/stream-fd.c:157 (34% CPU usage)
> >   2020-07-31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> >  : idle 5002 ms, sending inactivity probe
> >   2020-07-31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> >  : entering IDLE
> >   2020-07-31T23:08:12.777Z|04273|jsonrpc|DBG|tcp:10.6.20.85:6642
> >  : send request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:12.777Z|04274|jsonrpc|DBG|tcp:10.6.20.85:6642
> >  : received request, method="echo", params=[],
> > id="echo"
> >   2020-07-31T23:08:12.777Z|04275|reconnect|DBG|tcp:10.6.20.85:6642
> >  : entering ACTIVE
> >   2020-07-31T23:08:12.777Z|04276|jsonrpc|DBG|tcp:10.6.20.85:6642
> >  : send reply, result=[], id="echo"
> >   2020-07-31T23:08:13.635Z|04277|poll_loop|DBG|wakeup due to [POLLIN]
> > on fd 9 (10.6.20.84:49158  <->10.6.20.85:6642
> >  ) at lib/stream-fd.c:157 (34% CPU usage)
> >   2020-07-31T23:08:13.635Z|04278|jsonrpc|DBG|tcp:10.6.20.85:6642
> >  : received reply, result=[], id="echo"
> >   2020-07-31T23:08:14.480Z|04279|hmap|DBG|Dropped 129 log messages in
> > last 5 seconds (most recently, 0 seconds ago) due to excessive rate
> >   2020-07-31T23:08:14.480Z|04280|hmap|DBG|lib/shash.c:112: 2 buckets
> > with 6+ nodes, including 2 buckets with 6 nodes (32 nodes total across
> > 32 buckets)
> >   2020-07-31T23:08:14.513Z|04281|poll_loop|DBG|wakeup due to 27-ms
> > timeout at lib/reconnect.c:643 (34% CPU usage)
> >   2020-07-31T23:08:14.513Z|04282|reconnect|DBG|tcp:10.

Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no configuration update

2020-08-03 Thread Tony Liu
The probe awakes recomputing?
There is probe every 5 seconds. Without any connection up/down or failover,
ovn-northd will recompute everything every 5 seconds, no matter what?
Really?

Anyways, I will increase the probe interval for now, see if that helps.


Thanks!

Tony

> -Original Message-
> From: Han Zhou 
> Sent: Monday, August 3, 2020 8:22 PM
> To: Tony Liu 
> Cc: Han Zhou ; ovs-discuss ;
> ovs-dev 
> Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when no
> configuration update
> 
> Sorry that I didn't make it clear enough. The OVSDB probe itself doesn't
> take much CPU, but the probe awakes ovn-northd main loop, which
> recompute everything, which is why you see CPU spike.
> It will be solved by incremental-processing, when only delta is
> processed, and in case of probe handling, there is no change in
> configuration, so the delta is zero.
> For now, please follow the steps to adjust probe interval, if the CPU of
> ovn-northd (when there is no configuration change) is a concern for you.
> But please remember that this has no impact to the real CPU usage for
> handling configuration changes.
> 
> 
> Thanks,
> Han
> 
> 
> On Mon, Aug 3, 2020 at 8:11 PM Tony Liu   > wrote:
> 
> 
>   Health check (5 sec internal) taking 30%-100% CPU is definitely not
> acceptable,
>   if that's really the case. There must be some blocking (and not
> yielding CPU)
>   in coding, which is not supposed to be there.
> 
>   Could you point me to the coding for such health check?
>   Is it single thread? Does it use any event library?
> 
> 
>   Thanks!
> 
>   Tony
> 
>   > -Original Message-
>   > From: Han Zhou mailto:hz...@ovn.org> >
>   > Sent: Saturday, August 1, 2020 9:11 PM
>   > To: Tony Liu   >
>   > Cc: ovs-discuss mailto:ovs-
> disc...@openvswitch.org> >; ovs-dev> d...@openvswitch.org  >
>   > Subject: Re: [ovs-discuss] [OVN] ovn-northd takes much CPU when
> no
>   > configuration update
>   >
>   >
>   >
>   > On Fri, Jul 31, 2020 at 4:14 PM Tony Liu  
>   >   > > wrote:
>   >
>   >
>   >   Hi,
>   >
>   >   I see the active ovn-northd takes much CPU (30% - 100%)
> when there
>   > is no
>   >   configuration from OpenStack, nothing happening on all
> chassis
>   > nodes either.
>   >
>   >   Is this expected? What is it busy with?
>   >
>   >
>   >
>   >
>   > Yes, this is expected. It is due to the OVSDB probe between ovn-
> northd
>   > and NB/SB OVSDB servers, which is used to detect the OVSDB
> connection
>   > failure.
>   > Usually this is not a concern (unlike the probe with a large
> number of
>   > ovn-controller clients), because ovn-northd is a centralized
> component
>   > and the CPU cost when there is no configuration change doesn't
> matter
>   > that much. However, if it is a concern, the probe interval
> (default 5
>   > sec) can be changed.
>   > If you change, remember to change on both server side and client
> side.
>   > For client side (ovn-northd), it is configured in the NB DB's
> NB_Global
>   > table's options:northd_probe_interval. See man page of ovn-nb(5).
>   > For server side (NB and SB), it is configured in the NB and SB
> DB's
>   > Connection table's inactivity_probe column.
>   >
>   > Thanks,
>   > Han
>   >
>   >
>   >
>   >   
>   >   2020-07-31T23:08:09.511Z|04267|poll_loop|DBG|wakeup due to
> [POLLIN]
>   > on fd 8 (10.6.20.84:44358 
>  <->10.6.20.84:6641 
>   >  ) at lib/stream-fd.c:157 (68% CPU usage)
>   >   2020-07-
> 31T23:08:09.512Z|04268|jsonrpc|DBG|tcp:10.6.20.84:6641
> 
>   >  : received request, method="echo",
> params=[],
>   > id="echo"
>   >   2020-07-
> 31T23:08:09.512Z|04269|jsonrpc|DBG|tcp:10.6.20.84:6641
> 
>   >  : send reply, result=[], id="echo"
>   >   2020-07-31T23:08:12.777Z|04270|poll_loop|DBG|wakeup due to
> [POLLIN]
>   > on fd 9 (10.6.20.84:49158 
>  <->10.6.20.85:6642 
>   >  ) at lib/stream-fd.c:157 (34% CPU usage)
>   >   2020-07-
> 31T23:08:12.777Z|04271|reconnect|DBG|tcp:10.6.20.85:6642
> 
>   >  : idle 5002 ms, sending inactivity probe
>   >   2020-07-
> 31T23:08:12.777Z|04272|reconnect|DBG|tcp:10.6.20.85:6642
> 
>   >  

Re: [ovs-discuss] [OVN] no response to inactivity probe

2020-08-03 Thread Tony Liu
In my deployment, on each Neutron server, there are 13 Neutron server processes.
I see 12 of them (monitor, maintenance, RPC, API) connect to both ovn-nb-db
and ovn-sb-db. With 3 Neutron server nodes, that's 36 OVSDB clients.
Is so many clients OK?

Any suggestions how to figure out which side doesn't respond the probe,
if it's bi-directional? I don't see any activities from logging, other than
connect/drop and reconnect...

BTW, please let me know if this is not the right place to discuss Neutron OVN
ML2 driver.


Thanks!

Tony

> -Original Message-
> From: dev  On Behalf Of Tony Liu
> Sent: Monday, August 3, 2020 7:45 PM
> To: ovs-discuss ; ovs-dev  d...@openvswitch.org>
> Subject: [ovs-dev] [OVN] no response to inactivity probe
> 
> Hi,
> 
> Neutron OVN ML2 driver was disconnected by ovn-nb-db. There are many
> error messages from ovn-nb-db leader.
> 
> 2020-08-04T02:31:39.751Z|03138|reconnect|ERR|tcp:10.6.20.81:58620: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:42.484Z|03139|reconnect|ERR|tcp:10.6.20.81:58300: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:49.858Z|03140|reconnect|ERR|tcp:10.6.20.81:59582: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:53.057Z|03141|reconnect|ERR|tcp:10.6.20.83:42626: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:53.058Z|03142|reconnect|ERR|tcp:10.6.20.82:45412: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:54.067Z|03143|reconnect|ERR|tcp:10.6.20.81:59416: no
> response to inactivity probe after 5 seconds, disconnecting
> 2020-08-04T02:31:54.809Z|03144|reconnect|ERR|tcp:10.6.20.81:60004: no
> response to inactivity probe after 5 seconds, disconnecting 
> 
> Could anyone share a bit details how this inactivity probe works?
> From OVN ML2 driver log, I see it connected to the leader, then the
> connection was closed by leader after 5 or 6 seconds. Is this probe one-
> way or two-ways?
> Both sides are not busy, not taking much CPU cycles. Not sure how this
> could happen. Any thoughts?
> 
> 
> Thanks!
> 
> Tony
> 
> 
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss