Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-07-01 Thread Dumitru Ceara via discuss
On 7/1/24 14:31, Шагов Георгий wrote:
> Hello Dumitru, Numan, Han
> 

Hi George,

> I have implemented a Handler for SB_datapath_binding node, could you give me 
> a favor in taking a look before I would submit a patch (that is attached to 
> the Issue).
> SVN Issue: https://github.com/ovn-org/ovn/issues/249
> The scenario the problem is reproduced is very simple; we just create about 
> 50 routers with 50 subnets for each router and one port for every subnet. 
> This has taken about 140 mins and then we can see recomputes in INFO log.
> Honestly saying this patch didn't help, after applying one we still observe 
> 140 mins

I guess we need to better understand what the test does.  I quickly
tried creating 50 routers with 50 ports each in an ovn sandbox based on
current main and I don't really see issues.

Would it be possible to share a script that provisions the NB database
in a similar way as your test?

Or, if no sensitive data is stored in the databases, could you please
link to the NB and SB databases (attaching them to the issue is fine too).

Regards,
Dumitru

> Yet, I would be happy to see your recommendations, may be the patch could be 
> improved.
> 
> Yours truly, George
> 
> 
> 
> On 17.05.2024, 15:28, "Dumitru Ceara"  <mailto:dce...@redhat.com>> wrote:
> 
> 
> ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
> Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
> пароль,
> не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru 
> <mailto:secur...@cloud.ru><mailto:secur...@cloud.ru 
> <mailto:secur...@cloud.ru>>
> 
> 
> On 5/17/24 14:20, Шагов Георгий wrote:
>> At the first, Numan, Dumitru, Han, I really appreciate your replies, will do 
>> my best to describe the case in details
>>
> 
> 
> No problem!
> 
> 
>> NS> Does your deployment create/delete logical switches/routers frequently
>>
>> It is a separate service, i.e.: Open Stack Nova that is creating a Port 
>> every time when user creates new VM. It does not happen too frequently, yet, 
>> if it happens, we have a huge transaction flow at NorthD then this 
>> transaction going from SBDB to NBDB could be delayed, i.e. this means not 
>> every transaction takes 52 secs in delay, I think it is 25%, that is 
>> considerable, of cause.
>>
>> HZ> Yes, it is better to understand why in this deployment the recompute 
>> took so long (52s). Is it simply too large scale, or is it because of some 
>> uncommon configuration that we don't handle efficiently and can be optimized 
>> to improve recompute performance
>>
>> A Very good question.
>> Yes, there is a peculiarity with our custom installation, i.e.: we have 
>> implemented L3 connectivity to External Provider Network, this customization 
>> was done at ovn-controller level. That was a biz demand and it is working 
>> fine, but as a drawback we have got a problem with huge Logical_DP_Groups I 
>> described earlier: Link1
>>
>> DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB
>> databases and the NB/SB jsonrpc update that caused the recompute
>>
>> Yes, we are working on this and at this moment we have an ovn-heater 
>> installation, that is using Vanilla (non-custom) OVN installation, 
>> simulating the case we have at prod
> 
> 
> Kind of related, if you have time it would be awesome if you could share
> your ovn-heater configuration file. Maybe I can use it internally in
> our scale test lab and report results upstream.
> 
> 
>> Could you please tell me the bast way in submitting the dumps for both 
>> (NB|SB)DBs
>> I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues 
>> <https://github.com/ovn-org/ovn/issues>
>>
> 
> 
> That works for me but please link the issue here. Most development
> happens on-list and github issues don't get as much attention.
> 
> 
>> Thank you
>> Yours truly, George
>>
> 
> 
> Best regards,
> Dumitru
> 
> 
>>
>> From: Numan Siddique mailto:num...@ovn.org>>
>> Date: Wednesday, 8 May 2024, 19:01
>> To: Шагов Георгий mailto:gmsha...@cloud.ru>>, Dumitru 
>> Ceara mailto:dce...@redhat.com>>, Han Zhou 
>> mailto:hz...@ovn.org>>
>> Cc: "ovs-discuss@openvswitch.org <mailto:ovs-discuss@openvswitch.org>" 
>> mailto:ovs-discuss@openvswitch.org>>
>> Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
>>
>> ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
>> Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
>> пароль,
>&g

Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-07-01 Thread Шагов Георгий via discuss
Hello Dumitru, Numan, Han

I have implemented a Handler for SB_datapath_binding node, could you give me a 
favor in taking a look before I would submit a patch (that is attached to the 
Issue).
SVN Issue: https://github.com/ovn-org/ovn/issues/249
The scenario the problem is reproduced is very simple; we just create about 50 
routers with 50 subnets for each router and one port for every subnet. This has 
taken about 140 mins and then we can see recomputes in INFO log.
Honestly saying this patch didn't help, after applying one we still observe 140 
mins
Yet, I would be happy to see your recommendations, may be the patch could be 
improved.

Yours truly, George



On 17.05.2024, 15:28, "Dumitru Ceara" mailto:dce...@redhat.com>> wrote:


ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
пароль,
не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru 
<mailto:secur...@cloud.ru><mailto:secur...@cloud.ru <mailto:secur...@cloud.ru>>


On 5/17/24 14:20, Шагов Георгий wrote:
> At the first, Numan, Dumitru, Han, I really appreciate your replies, will do 
> my best to describe the case in details
>


No problem!


> NS> Does your deployment create/delete logical switches/routers frequently
>
> It is a separate service, i.e.: Open Stack Nova that is creating a Port every 
> time when user creates new VM. It does not happen too frequently, yet, if it 
> happens, we have a huge transaction flow at NorthD then this transaction 
> going from SBDB to NBDB could be delayed, i.e. this means not every 
> transaction takes 52 secs in delay, I think it is 25%, that is considerable, 
> of cause.
>
> HZ> Yes, it is better to understand why in this deployment the recompute took 
> so long (52s). Is it simply too large scale, or is it because of some 
> uncommon configuration that we don't handle efficiently and can be optimized 
> to improve recompute performance
>
> A Very good question.
> Yes, there is a peculiarity with our custom installation, i.e.: we have 
> implemented L3 connectivity to External Provider Network, this customization 
> was done at ovn-controller level. That was a biz demand and it is working 
> fine, but as a drawback we have got a problem with huge Logical_DP_Groups I 
> described earlier: Link1
>
> DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB
> databases and the NB/SB jsonrpc update that caused the recompute
>
> Yes, we are working on this and at this moment we have an ovn-heater 
> installation, that is using Vanilla (non-custom) OVN installation, simulating 
> the case we have at prod


Kind of related, if you have time it would be awesome if you could share
your ovn-heater configuration file. Maybe I can use it internally in
our scale test lab and report results upstream.


> Could you please tell me the bast way in submitting the dumps for both 
> (NB|SB)DBs
> I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues 
> <https://github.com/ovn-org/ovn/issues>
>


That works for me but please link the issue here. Most development
happens on-list and github issues don't get as much attention.


> Thank you
> Yours truly, George
>


Best regards,
Dumitru


>
> From: Numan Siddique mailto:num...@ovn.org>>
> Date: Wednesday, 8 May 2024, 19:01
> To: Шагов Георгий mailto:gmsha...@cloud.ru>>, Dumitru 
> Ceara mailto:dce...@redhat.com>>, Han Zhou  <mailto:hz...@ovn.org>>
> Cc: "ovs-discuss@openvswitch.org <mailto:ovs-discuss@openvswitch.org>" 
> mailto:ovs-discuss@openvswitch.org>>
> Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
>
> ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
> Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
> пароль,
> не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru 
> <mailto:secur...@cloud.ru><mailto:secur...@cloud.ru 
> <mailto:secur...@cloud.ru>>
>
>
> On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss 
>  <mailto:ovs-discuss@openvswitch.org><mailto:ovs-discuss@openvswitch.org 
> <mailto:ovs-discuss@openvswitch.org>>> wrote:
> Hello everyone
>
> In some aspect it might be considered as a continuation of this thread: 
> (link1), yet it is different
> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found 
> increase in performance in 3-4 times
> And yet still we do observe high CPU load for NorthD process; taking deeper 
> into the logs we have found:
>
>
> Thanks for reporting this issue.
>
>
> 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 
> (10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:66

Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-17 Thread Dumitru Ceara via discuss
On 5/17/24 14:20, Шагов Георгий wrote:
> At the first, Numan, Dumitru, Han, I really appreciate your replies, will do 
> my best to describe the case in details
> 

No problem!

> NS> Does your deployment create/delete logical switches/routers frequently
> 
> It is a separate service, i.e.: Open Stack Nova that is creating a Port every 
> time when user creates new VM. It does not happen too frequently, yet, if it 
> happens, we have a huge transaction flow at NorthD then this transaction 
> going from SBDB to NBDB could be delayed, i.e. this means not every 
> transaction takes 52 secs in delay, I think it is 25%, that is considerable, 
> of cause.
> 
> HZ> Yes, it is better to understand why in this deployment the recompute took 
> so long (52s). Is it simply too large scale, or is it because of some 
> uncommon configuration that we don't handle efficiently and can be optimized 
> to improve recompute performance
> 
> A Very good question.
> Yes, there is a peculiarity with our custom installation, i.e.: we have 
> implemented L3 connectivity to External Provider Network, this customization 
> was done at ovn-controller level. That was a biz demand and it is working 
> fine, but as a drawback we have got a problem with huge Logical_DP_Groups I 
> described earlier: Link1
> 
> DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB
> databases and the NB/SB jsonrpc update that caused the recompute
> 
> Yes, we are working on this and at this moment we have an ovn-heater 
> installation, that is using Vanilla (non-custom) OVN installation, simulating 
> the case we have at prod

Kind of related, if you have time it would be awesome if you could share
your ovn-heater configuration file.  Maybe I can use it internally in
our scale test lab and report results upstream.

> Could you please tell me the bast way in submitting the dumps for both 
> (NB|SB)DBs
> I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues
> 

That works for me but please link the issue here.  Most development
happens on-list and github issues don't get as much attention.

> Thank you
> Yours truly, George
> 

Best regards,
Dumitru

> 
> From: Numan Siddique 
> Date: Wednesday, 8 May 2024, 19:01
> To: Шагов Георгий , Dumitru Ceara , Han 
> Zhou 
> Cc: "ovs-discuss@openvswitch.org" 
> Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
> 
> ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
> Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
> пароль,
> не запускайте вложения и сообщите коллегам из ЦКЗ на 
> secur...@cloud.ru<mailto:secur...@cloud.ru>
> 
> 
> On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss 
> mailto:ovs-discuss@openvswitch.org>> wrote:
> Hello everyone
> 
> In some aspect it might be considered as a continuation of this thread: 
> (link1), yet it is different
> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found 
> increase in performance in 3-4 times
> And yet still we do observe high CPU load for NorthD process; taking deeper 
> into the logs we have found:
> 
> 
> Thanks for reporting this issue.
> 
> 
> 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 
> (10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:6642<http://10.34.22.66:6642>)
>  at lib/stream-fd.c:157 (94% CPU usage)
> 2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute 
> (missing handler for input SB_datapath_binding) took 52313ms
> 2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute 
> (failed handler for input northd) took 7759ms
> 2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll 
> interval (56201ms user, 2900ms system)
> 
> As you can see there is a significant delay in 52 secs
> Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – 
> practically means absence of the inc-engine handler from some node (in this 
> sample: SB_datapath_binding)
> 
> That's correct.
> 
> Before plunging into Development it would be great to clarify/adjust with 
> Community’s position
> 
>   *   Why there is not handler for this node?
> 
> Our approach has been to add a handler  for any input change only if it is 
> frequent or if it can be easily handled.
> We also have skipped adding handlers if it increases the code complexity.  
> Having said that I think we are open
> to adding more handlers if it makes sense or if it results in scale 
> improvements.
> 
> Right now we fall back to a full recompute of northd engine for any changes 
> to a logical switch or logical router.
> Does your deployment create/delete logical switch

Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-17 Thread Шагов Георгий via discuss
At the first, Numan, Dumitru, Han, I really appreciate your replies, will do my 
best to describe the case in details

NS> Does your deployment create/delete logical switches/routers frequently

It is a separate service, i.e.: Open Stack Nova that is creating a Port every 
time when user creates new VM. It does not happen too frequently, yet, if it 
happens, we have a huge transaction flow at NorthD then this transaction going 
from SBDB to NBDB could be delayed, i.e. this means not every transaction takes 
52 secs in delay, I think it is 25%, that is considerable, of cause.

HZ> Yes, it is better to understand why in this deployment the recompute took 
so long (52s). Is it simply too large scale, or is it because of some uncommon 
configuration that we don't handle efficiently and can be optimized to improve 
recompute performance

A Very good question.
Yes, there is a peculiarity with our custom installation, i.e.: we have 
implemented L3 connectivity to External Provider Network, this customization 
was done at ovn-controller level. That was a biz demand and it is working fine, 
but as a drawback we have got a problem with huge Logical_DP_Groups I described 
earlier: Link1

DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB
databases and the NB/SB jsonrpc update that caused the recompute

Yes, we are working on this and at this moment we have an ovn-heater 
installation, that is using Vanilla (non-custom) OVN installation, simulating 
the case we have at prod
Could you please tell me the bast way in submitting the dumps for both 
(NB|SB)DBs
I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues

Thank you
Yours truly, George


From: Numan Siddique 
Date: Wednesday, 8 May 2024, 19:01
To: Шагов Георгий , Dumitru Ceara , Han 
Zhou 
Cc: "ovs-discuss@openvswitch.org" 
Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ
Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте 
пароль,
не запускайте вложения и сообщите коллегам из ЦКЗ на 
secur...@cloud.ru<mailto:secur...@cloud.ru>


On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss 
mailto:ovs-discuss@openvswitch.org>> wrote:
Hello everyone

In some aspect it might be considered as a continuation of this thread: 
(link1), yet it is different
After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found 
increase in performance in 3-4 times
And yet still we do observe high CPU load for NorthD process; taking deeper 
into the logs we have found:


Thanks for reporting this issue.


2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 
(10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:6642<http://10.34.22.66:6642>)
 at lib/stream-fd.c:157 (94% CPU usage)
2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute 
(missing handler for input SB_datapath_binding) took 52313ms
2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute (failed 
handler for input northd) took 7759ms
2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll 
interval (56201ms user, 2900ms system)

As you can see there is a significant delay in 52 secs
Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – 
practically means absence of the inc-engine handler from some node (in this 
sample: SB_datapath_binding)

That's correct.

Before plunging into Development it would be great to clarify/adjust with 
Community’s position

  *   Why there is not handler for this node?

Our approach has been to add a handler  for any input change only if it is 
frequent or if it can be easily handled.
We also have skipped adding handlers if it increases the code complexity.  
Having said that I think we are open
to adding more handlers if it makes sense or if it results in scale 
improvements.

Right now we fall back to a full recompute of northd engine for any changes to 
a logical switch or logical router.
Does your deployment create/delete logical switches/routers frequently ?  Is it 
possible to enable ovn debug logs
and share them ?  I'm curious to know what are the changes to SB datapath 
binding.

Feel free to share your OVN NB and SB DBs if you're ok with it.  I can deploy 
those DBs and see why recompute is so expensive.



  *   Any particular reason for this or just the peculiarity of our 
installation highlighted this issue?

My guess is that your installation is frequently creating , deleting or 
modifying logical switches or routers.


  *
  *   Do you think there is a reason in implementing that handler? 
(SB_datapath_binding)

I'm fine adding a handler if it helps in the scale.   In our use cases, we 
don't frequently create/delete the logical switches and routers
and hence it is ok to fall back to full recomputes for such changes.


  *

Any ideas are highly appreciated.

You're welcome to work on it and submit patches to 

Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-15 Thread Han Zhou via discuss
On Tue, May 14, 2024 at 12:31 AM Dumitru Ceara  wrote:
>
> On 5/8/24 18:01, Numan Siddique wrote:
> > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss <
> > ovs-discuss@openvswitch.org> wrote:
> >
> >> Hello everyone
> >>
> >>
> >>
> >> In some aspect it might be considered as a continuation of this thread:
> >> (link1), yet it is different
> >>
> >> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found
> >> increase in performance in 3-4 times
> >>
> >> And yet still we do observe high CPU load for NorthD process; taking
> >> deeper into the logs we have found:
> >>
> >>
> >>
> >
> > Thanks for reporting this issue.
> >
> >
> > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on
fd
> >> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94%
CPU
> >> usage)
> >>
> >> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd,
recompute
> >> (missing handler for input SB_datapath_binding) took 52313ms*
> >>
> >> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow,
recompute
> >> (failed handler for input northd) took 7759ms*
> >>
> >> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms
> >> poll interval (56201ms user, 2900ms system)*
> >>
> >>
> >>
> >> As you can see there is a significant delay in 52 secs
> >>
>
> This is huge indeed!
>
> >> Correct me please, if I am in the wrong, but IMU: ‘*missing handler
for*’
> >> – practically means absence of the inc-engine handler from some node
(in
> >> this sample: *SB_datapath_binding*)
> >>
> >
> > That's correct.
> >
> > Before plunging into Development it would be great to clarify/adjust
with
> >> Community’s position
> >>
> >>- Why there is not handler for this node?
> >>
> >>
> > Our approach has been to add a handler  for any input change only if it
is
> > frequent or if it can be easily handled.
> > We also have skipped adding handlers if it increases the code
complexity.
> > Having said that I think we are open
> > to adding more handlers if it makes sense or if it results in scale
> > improvements.
> >
> > Right now we fall back to a full recompute of northd engine for any
changes
> > to a logical switch or logical router.
> > Does your deployment create/delete logical switches/routers frequently ?
> > Is it possible to enable ovn debug logs
> > and share them ?  I'm curious to know what are the changes to SB
datapath
> > binding.
> >
> > Feel free to share your OVN NB and SB DBs if you're ok with it.  I can
> > deploy those DBs and see why recompute is so expensive.
> >
> >
> >
> >>- Any particular reason for this or just the peculiarity of our
> >>installation highlighted this issue?
> >>
> >>
> > My guess is that your installation is frequently creating , deleting or
> > modifying logical switches or routers.
> >
> >
> >>-
> >>- Do you think there is a reason in implementing that handler? (
> >>*SB_datapath_binding*)
> >>
> >>
> > I'm fine adding a handler if it helps in the scale.   In our use cases,
we
> > don't frequently create/delete the logical switches and routers
> > and hence it is ok to fall back to full recomputes for such changes.
> >
> >
> >>-
> >>
> >>
> >>
> >> Any ideas are highly appreciated.
> >>
> >
> > You're welcome to work on it and submit patches to add a handler for
> > SB_datapath_binding.
> >
> > @Dumitru Ceara  @Han Zhou  if you've
any
> > reservations on adding more handlers please do comment here.
> >
>
> In general, especially if it fixes a scalability issue like this one,
> it's probably fine.  In practice it depends a bit on how much complexity
> this would add to the code.
>
I agree with the general statement.

> But the best way to tell is to have a way to reproduce this, e.g., NB/SB
> databases and the NB/SB jsonrpc update that caused the recompute.
>

Yes, it is better to understand why in this deployment the recompute took
so long (52s). Is it simply too large scale, or is it because of some
uncommon configuration that we don't handle efficiently and can be
optimized to improve recompute performance.

Otherwise, even if we can implement datapath I-P, there can be just another
input change that triggers recompute and causes the same latency. It is
just not sustainable to maintain more and more I-P in northd.

> Regards,
> Dumitru
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-14 Thread Dumitru Ceara via discuss
On 5/8/24 18:01, Numan Siddique wrote:
> On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss <
> ovs-discuss@openvswitch.org> wrote:
> 
>> Hello everyone
>>
>>
>>
>> In some aspect it might be considered as a continuation of this thread:
>> (link1), yet it is different
>>
>> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found
>> increase in performance in 3-4 times
>>
>> And yet still we do observe high CPU load for NorthD process; taking
>> deeper into the logs we have found:
>>
>>
>>
> 
> Thanks for reporting this issue.
> 
> 
> 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd
>> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU
>> usage)
>>
>> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute
>> (missing handler for input SB_datapath_binding) took 52313ms*
>>
>> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute
>> (failed handler for input northd) took 7759ms*
>>
>> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms
>> poll interval (56201ms user, 2900ms system)*
>>
>>
>>
>> As you can see there is a significant delay in 52 secs
>>

This is huge indeed!

>> Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’
>> – practically means absence of the inc-engine handler from some node (in
>> this sample: *SB_datapath_binding*)
>>
> 
> That's correct.
> 
> Before plunging into Development it would be great to clarify/adjust with
>> Community’s position
>>
>>- Why there is not handler for this node?
>>
>>
> Our approach has been to add a handler  for any input change only if it is
> frequent or if it can be easily handled.
> We also have skipped adding handlers if it increases the code complexity.
> Having said that I think we are open
> to adding more handlers if it makes sense or if it results in scale
> improvements.
> 
> Right now we fall back to a full recompute of northd engine for any changes
> to a logical switch or logical router.
> Does your deployment create/delete logical switches/routers frequently ?
> Is it possible to enable ovn debug logs
> and share them ?  I'm curious to know what are the changes to SB datapath
> binding.
> 
> Feel free to share your OVN NB and SB DBs if you're ok with it.  I can
> deploy those DBs and see why recompute is so expensive.
> 
> 
> 
>>- Any particular reason for this or just the peculiarity of our
>>installation highlighted this issue?
>>
>>
> My guess is that your installation is frequently creating , deleting or
> modifying logical switches or routers.
> 
> 
>>-
>>- Do you think there is a reason in implementing that handler? (
>>*SB_datapath_binding*)
>>
>>
> I'm fine adding a handler if it helps in the scale.   In our use cases, we
> don't frequently create/delete the logical switches and routers
> and hence it is ok to fall back to full recomputes for such changes.
> 
> 
>>-
>>
>>
>>
>> Any ideas are highly appreciated.
>>
> 
> You're welcome to work on it and submit patches to add a handler for
> SB_datapath_binding.
> 
> @Dumitru Ceara  @Han Zhou  if you've any
> reservations on adding more handlers please do comment here.
> 

In general, especially if it fixes a scalability issue like this one,
it's probably fine.  In practice it depends a bit on how much complexity
this would add to the code.

But the best way to tell is to have a way to reproduce this, e.g., NB/SB
databases and the NB/SB jsonrpc update that caused the recompute.

Regards,
Dumitru

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-08 Thread Numan Siddique via discuss
On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss <
ovs-discuss@openvswitch.org> wrote:

> Hello everyone
>
>
>
> In some aspect it might be considered as a continuation of this thread:
> (link1), yet it is different
>
> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found
> increase in performance in 3-4 times
>
> And yet still we do observe high CPU load for NorthD process; taking
> deeper into the logs we have found:
>
>
>

Thanks for reporting this issue.


2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd
> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU
> usage)
>
> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute
> (missing handler for input SB_datapath_binding) took 52313ms*
>
> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute
> (failed handler for input northd) took 7759ms*
>
> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms
> poll interval (56201ms user, 2900ms system)*
>
>
>
> As you can see there is a significant delay in 52 secs
>
> Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’
> – practically means absence of the inc-engine handler from some node (in
> this sample: *SB_datapath_binding*)
>

That's correct.

Before plunging into Development it would be great to clarify/adjust with
> Community’s position
>
>- Why there is not handler for this node?
>
>
Our approach has been to add a handler  for any input change only if it is
frequent or if it can be easily handled.
We also have skipped adding handlers if it increases the code complexity.
Having said that I think we are open
to adding more handlers if it makes sense or if it results in scale
improvements.

Right now we fall back to a full recompute of northd engine for any changes
to a logical switch or logical router.
Does your deployment create/delete logical switches/routers frequently ?
Is it possible to enable ovn debug logs
and share them ?  I'm curious to know what are the changes to SB datapath
binding.

Feel free to share your OVN NB and SB DBs if you're ok with it.  I can
deploy those DBs and see why recompute is so expensive.



>- Any particular reason for this or just the peculiarity of our
>installation highlighted this issue?
>
>
My guess is that your installation is frequently creating , deleting or
modifying logical switches or routers.


>-
>- Do you think there is a reason in implementing that handler? (
>*SB_datapath_binding*)
>
>
I'm fine adding a handler if it helps in the scale.   In our use cases, we
don't frequently create/delete the logical switches and routers
and hence it is ok to fall back to full recomputes for such changes.


>-
>
>
>
> Any ideas are highly appreciated.
>

You're welcome to work on it and submit patches to add a handler for
SB_datapath_binding.

@Dumitru Ceara  @Han Zhou  if you've any
reservations on adding more handlers please do comment here.

Thanks
Numan




>
> Yours truly, George
>
>
>
>
>
>- Link1:
>https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053035.html
>
> УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые
> документы, приложенные к нему, содержат конфиденциальную информацию.
> Настоящим уведомляем Вас о том, что если это сообщение не предназначено
> Вам, использование, копирование, распространение информации, содержащейся в
> настоящем сообщении, а также осуществление любых действий на основе этой
> информации, строго запрещено. Если Вы получили это сообщение по ошибке,
> пожалуйста, сообщите об этом отправителю по электронной почте и удалите это
> сообщение.
> CONFIDENTIALITY NOTICE: This email and any files attached to it are
> confidential. If you are not the intended recipient you are notified that
> using, copying, distributing or taking any action in reliance on the
> contents of this information is strictly prohibited. If you have received
> this email in error please notify the sender and delete this email.
> ___
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] NorthD inc-engine Handlers; OVN 24.03

2024-05-08 Thread Шагов Георгий via discuss
Hello everyone

In some aspect it might be considered as a continuation of this thread: 
(link1), yet it is different
After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found 
increase in performance in 3-4 times
And yet still we do observe high CPU load for NorthD process; taking deeper 
into the logs we have found:

2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 
(10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU usage)
2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute 
(missing handler for input SB_datapath_binding) took 52313ms
2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute (failed 
handler for input northd) took 7759ms
2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll 
interval (56201ms user, 2900ms system)

As you can see there is a significant delay in 52 secs
Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – 
practically means absence of the inc-engine handler from some node (in this 
sample: SB_datapath_binding)
Before plunging into Development it would be great to clarify/adjust with 
Community’s position

  *   Why there is not handler for this node?
  *   Any particular reason for this or just the peculiarity of our 
installation highlighted this issue?
  *   Do you think there is a reason in implementing that handler? 
(SB_datapath_binding)

Any ideas are highly appreciated.

Yours truly, George



  *   Link1: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053035.html

УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, 
приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем 
Вас о том, что если это сообщение не предназначено Вам, использование, 
копирование, распространение информации, содержащейся в настоящем сообщении, а 
также осуществление любых действий на основе этой информации, строго запрещено. 
Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом 
отправителю по электронной почте и удалите это сообщение.
CONFIDENTIALITY NOTICE: This email and any files attached to it are 
confidential. If you are not the intended recipient you are notified that 
using, copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited. If you have received this email in 
error please notify the sender and delete this email.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss