Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
On 7/1/24 14:31, Шагов Георгий wrote: > Hello Dumitru, Numan, Han > Hi George, > I have implemented a Handler for SB_datapath_binding node, could you give me > a favor in taking a look before I would submit a patch (that is attached to > the Issue). > SVN Issue: https://github.com/ovn-org/ovn/issues/249 > The scenario the problem is reproduced is very simple; we just create about > 50 routers with 50 subnets for each router and one port for every subnet. > This has taken about 140 mins and then we can see recomputes in INFO log. > Honestly saying this patch didn't help, after applying one we still observe > 140 mins I guess we need to better understand what the test does. I quickly tried creating 50 routers with 50 ports each in an ovn sandbox based on current main and I don't really see issues. Would it be possible to share a script that provisions the NB database in a similar way as your test? Or, if no sensitive data is stored in the databases, could you please link to the NB and SB databases (attaching them to the issue is fine too). Regards, Dumitru > Yet, I would be happy to see your recommendations, may be the patch could be > improved. > > Yours truly, George > > > > On 17.05.2024, 15:28, "Dumitru Ceara" <mailto:dce...@redhat.com>> wrote: > > > ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ > Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте > пароль, > не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru > <mailto:secur...@cloud.ru><mailto:secur...@cloud.ru > <mailto:secur...@cloud.ru>> > > > On 5/17/24 14:20, Шагов Георгий wrote: >> At the first, Numan, Dumitru, Han, I really appreciate your replies, will do >> my best to describe the case in details >> > > > No problem! > > >> NS> Does your deployment create/delete logical switches/routers frequently >> >> It is a separate service, i.e.: Open Stack Nova that is creating a Port >> every time when user creates new VM. It does not happen too frequently, yet, >> if it happens, we have a huge transaction flow at NorthD then this >> transaction going from SBDB to NBDB could be delayed, i.e. this means not >> every transaction takes 52 secs in delay, I think it is 25%, that is >> considerable, of cause. >> >> HZ> Yes, it is better to understand why in this deployment the recompute >> took so long (52s). Is it simply too large scale, or is it because of some >> uncommon configuration that we don't handle efficiently and can be optimized >> to improve recompute performance >> >> A Very good question. >> Yes, there is a peculiarity with our custom installation, i.e.: we have >> implemented L3 connectivity to External Provider Network, this customization >> was done at ovn-controller level. That was a biz demand and it is working >> fine, but as a drawback we have got a problem with huge Logical_DP_Groups I >> described earlier: Link1 >> >> DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB >> databases and the NB/SB jsonrpc update that caused the recompute >> >> Yes, we are working on this and at this moment we have an ovn-heater >> installation, that is using Vanilla (non-custom) OVN installation, >> simulating the case we have at prod > > > Kind of related, if you have time it would be awesome if you could share > your ovn-heater configuration file. Maybe I can use it internally in > our scale test lab and report results upstream. > > >> Could you please tell me the bast way in submitting the dumps for both >> (NB|SB)DBs >> I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues >> <https://github.com/ovn-org/ovn/issues> >> > > > That works for me but please link the issue here. Most development > happens on-list and github issues don't get as much attention. > > >> Thank you >> Yours truly, George >> > > > Best regards, > Dumitru > > >> >> From: Numan Siddique mailto:num...@ovn.org>> >> Date: Wednesday, 8 May 2024, 19:01 >> To: Шагов Георгий mailto:gmsha...@cloud.ru>>, Dumitru >> Ceara mailto:dce...@redhat.com>>, Han Zhou >> mailto:hz...@ovn.org>> >> Cc: "ovs-discuss@openvswitch.org <mailto:ovs-discuss@openvswitch.org>" >> mailto:ovs-discuss@openvswitch.org>> >> Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03 >> >> ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ >> Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте >> пароль, >&g
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
Hello Dumitru, Numan, Han I have implemented a Handler for SB_datapath_binding node, could you give me a favor in taking a look before I would submit a patch (that is attached to the Issue). SVN Issue: https://github.com/ovn-org/ovn/issues/249 The scenario the problem is reproduced is very simple; we just create about 50 routers with 50 subnets for each router and one port for every subnet. This has taken about 140 mins and then we can see recomputes in INFO log. Honestly saying this patch didn't help, after applying one we still observe 140 mins Yet, I would be happy to see your recommendations, may be the patch could be improved. Yours truly, George On 17.05.2024, 15:28, "Dumitru Ceara" mailto:dce...@redhat.com>> wrote: ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте пароль, не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru <mailto:secur...@cloud.ru><mailto:secur...@cloud.ru <mailto:secur...@cloud.ru>> On 5/17/24 14:20, Шагов Георгий wrote: > At the first, Numan, Dumitru, Han, I really appreciate your replies, will do > my best to describe the case in details > No problem! > NS> Does your deployment create/delete logical switches/routers frequently > > It is a separate service, i.e.: Open Stack Nova that is creating a Port every > time when user creates new VM. It does not happen too frequently, yet, if it > happens, we have a huge transaction flow at NorthD then this transaction > going from SBDB to NBDB could be delayed, i.e. this means not every > transaction takes 52 secs in delay, I think it is 25%, that is considerable, > of cause. > > HZ> Yes, it is better to understand why in this deployment the recompute took > so long (52s). Is it simply too large scale, or is it because of some > uncommon configuration that we don't handle efficiently and can be optimized > to improve recompute performance > > A Very good question. > Yes, there is a peculiarity with our custom installation, i.e.: we have > implemented L3 connectivity to External Provider Network, this customization > was done at ovn-controller level. That was a biz demand and it is working > fine, but as a drawback we have got a problem with huge Logical_DP_Groups I > described earlier: Link1 > > DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB > databases and the NB/SB jsonrpc update that caused the recompute > > Yes, we are working on this and at this moment we have an ovn-heater > installation, that is using Vanilla (non-custom) OVN installation, simulating > the case we have at prod Kind of related, if you have time it would be awesome if you could share your ovn-heater configuration file. Maybe I can use it internally in our scale test lab and report results upstream. > Could you please tell me the bast way in submitting the dumps for both > (NB|SB)DBs > I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues > <https://github.com/ovn-org/ovn/issues> > That works for me but please link the issue here. Most development happens on-list and github issues don't get as much attention. > Thank you > Yours truly, George > Best regards, Dumitru > > From: Numan Siddique mailto:num...@ovn.org>> > Date: Wednesday, 8 May 2024, 19:01 > To: Шагов Георгий mailto:gmsha...@cloud.ru>>, Dumitru > Ceara mailto:dce...@redhat.com>>, Han Zhou <mailto:hz...@ovn.org>> > Cc: "ovs-discuss@openvswitch.org <mailto:ovs-discuss@openvswitch.org>" > mailto:ovs-discuss@openvswitch.org>> > Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03 > > ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ > Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте > пароль, > не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru > <mailto:secur...@cloud.ru><mailto:secur...@cloud.ru > <mailto:secur...@cloud.ru>> > > > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss > <mailto:ovs-discuss@openvswitch.org><mailto:ovs-discuss@openvswitch.org > <mailto:ovs-discuss@openvswitch.org>>> wrote: > Hello everyone > > In some aspect it might be considered as a continuation of this thread: > (link1), yet it is different > After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found > increase in performance in 3-4 times > And yet still we do observe high CPU load for NorthD process; taking deeper > into the logs we have found: > > > Thanks for reporting this issue. > > > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 > (10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:66
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
On 5/17/24 14:20, Шагов Георгий wrote: > At the first, Numan, Dumitru, Han, I really appreciate your replies, will do > my best to describe the case in details > No problem! > NS> Does your deployment create/delete logical switches/routers frequently > > It is a separate service, i.e.: Open Stack Nova that is creating a Port every > time when user creates new VM. It does not happen too frequently, yet, if it > happens, we have a huge transaction flow at NorthD then this transaction > going from SBDB to NBDB could be delayed, i.e. this means not every > transaction takes 52 secs in delay, I think it is 25%, that is considerable, > of cause. > > HZ> Yes, it is better to understand why in this deployment the recompute took > so long (52s). Is it simply too large scale, or is it because of some > uncommon configuration that we don't handle efficiently and can be optimized > to improve recompute performance > > A Very good question. > Yes, there is a peculiarity with our custom installation, i.e.: we have > implemented L3 connectivity to External Provider Network, this customization > was done at ovn-controller level. That was a biz demand and it is working > fine, but as a drawback we have got a problem with huge Logical_DP_Groups I > described earlier: Link1 > > DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB > databases and the NB/SB jsonrpc update that caused the recompute > > Yes, we are working on this and at this moment we have an ovn-heater > installation, that is using Vanilla (non-custom) OVN installation, simulating > the case we have at prod Kind of related, if you have time it would be awesome if you could share your ovn-heater configuration file. Maybe I can use it internally in our scale test lab and report results upstream. > Could you please tell me the bast way in submitting the dumps for both > (NB|SB)DBs > I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues > That works for me but please link the issue here. Most development happens on-list and github issues don't get as much attention. > Thank you > Yours truly, George > Best regards, Dumitru > > From: Numan Siddique > Date: Wednesday, 8 May 2024, 19:01 > To: Шагов Георгий , Dumitru Ceara , Han > Zhou > Cc: "ovs-discuss@openvswitch.org" > Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03 > > ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ > Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте > пароль, > не запускайте вложения и сообщите коллегам из ЦКЗ на > secur...@cloud.ru<mailto:secur...@cloud.ru> > > > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss > mailto:ovs-discuss@openvswitch.org>> wrote: > Hello everyone > > In some aspect it might be considered as a continuation of this thread: > (link1), yet it is different > After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found > increase in performance in 3-4 times > And yet still we do observe high CPU load for NorthD process; taking deeper > into the logs we have found: > > > Thanks for reporting this issue. > > > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 > (10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:6642<http://10.34.22.66:6642>) > at lib/stream-fd.c:157 (94% CPU usage) > 2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute > (missing handler for input SB_datapath_binding) took 52313ms > 2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute > (failed handler for input northd) took 7759ms > 2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll > interval (56201ms user, 2900ms system) > > As you can see there is a significant delay in 52 secs > Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – > practically means absence of the inc-engine handler from some node (in this > sample: SB_datapath_binding) > > That's correct. > > Before plunging into Development it would be great to clarify/adjust with > Community’s position > > * Why there is not handler for this node? > > Our approach has been to add a handler for any input change only if it is > frequent or if it can be easily handled. > We also have skipped adding handlers if it increases the code complexity. > Having said that I think we are open > to adding more handlers if it makes sense or if it results in scale > improvements. > > Right now we fall back to a full recompute of northd engine for any changes > to a logical switch or logical router. > Does your deployment create/delete logical switch
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
At the first, Numan, Dumitru, Han, I really appreciate your replies, will do my best to describe the case in details NS> Does your deployment create/delete logical switches/routers frequently It is a separate service, i.e.: Open Stack Nova that is creating a Port every time when user creates new VM. It does not happen too frequently, yet, if it happens, we have a huge transaction flow at NorthD then this transaction going from SBDB to NBDB could be delayed, i.e. this means not every transaction takes 52 secs in delay, I think it is 25%, that is considerable, of cause. HZ> Yes, it is better to understand why in this deployment the recompute took so long (52s). Is it simply too large scale, or is it because of some uncommon configuration that we don't handle efficiently and can be optimized to improve recompute performance A Very good question. Yes, there is a peculiarity with our custom installation, i.e.: we have implemented L3 connectivity to External Provider Network, this customization was done at ovn-controller level. That was a biz demand and it is working fine, but as a drawback we have got a problem with huge Logical_DP_Groups I described earlier: Link1 DC> But the best way to tell is to have a way to reproduce this, e.g., NB/SB databases and the NB/SB jsonrpc update that caused the recompute Yes, we are working on this and at this moment we have an ovn-heater installation, that is using Vanilla (non-custom) OVN installation, simulating the case we have at prod Could you please tell me the bast way in submitting the dumps for both (NB|SB)DBs I can do an Issue at ovn github: https://github.com/ovn-org/ovn/issues Thank you Yours truly, George From: Numan Siddique Date: Wednesday, 8 May 2024, 19:01 To: Шагов Георгий , Dumitru Ceara , Han Zhou Cc: "ovs-discuss@openvswitch.org" Subject: Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03 ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте пароль, не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru<mailto:secur...@cloud.ru> On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss mailto:ovs-discuss@openvswitch.org>> wrote: Hello everyone In some aspect it might be considered as a continuation of this thread: (link1), yet it is different After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found increase in performance in 3-4 times And yet still we do observe high CPU load for NorthD process; taking deeper into the logs we have found: Thanks for reporting this issue. 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 (10.34.22.66:60716<http://10.34.22.66:60716><->10.34.22.66:6642<http://10.34.22.66:6642>) at lib/stream-fd.c:157 (94% CPU usage) 2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute (missing handler for input SB_datapath_binding) took 52313ms 2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute (failed handler for input northd) took 7759ms 2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll interval (56201ms user, 2900ms system) As you can see there is a significant delay in 52 secs Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – practically means absence of the inc-engine handler from some node (in this sample: SB_datapath_binding) That's correct. Before plunging into Development it would be great to clarify/adjust with Community’s position * Why there is not handler for this node? Our approach has been to add a handler for any input change only if it is frequent or if it can be easily handled. We also have skipped adding handlers if it increases the code complexity. Having said that I think we are open to adding more handlers if it makes sense or if it results in scale improvements. Right now we fall back to a full recompute of northd engine for any changes to a logical switch or logical router. Does your deployment create/delete logical switches/routers frequently ? Is it possible to enable ovn debug logs and share them ? I'm curious to know what are the changes to SB datapath binding. Feel free to share your OVN NB and SB DBs if you're ok with it. I can deploy those DBs and see why recompute is so expensive. * Any particular reason for this or just the peculiarity of our installation highlighted this issue? My guess is that your installation is frequently creating , deleting or modifying logical switches or routers. * * Do you think there is a reason in implementing that handler? (SB_datapath_binding) I'm fine adding a handler if it helps in the scale. In our use cases, we don't frequently create/delete the logical switches and routers and hence it is ok to fall back to full recomputes for such changes. * Any ideas are highly appreciated. You're welcome to work on it and submit patches to
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
On Tue, May 14, 2024 at 12:31 AM Dumitru Ceara wrote: > > On 5/8/24 18:01, Numan Siddique wrote: > > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss < > > ovs-discuss@openvswitch.org> wrote: > > > >> Hello everyone > >> > >> > >> > >> In some aspect it might be considered as a continuation of this thread: > >> (link1), yet it is different > >> > >> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found > >> increase in performance in 3-4 times > >> > >> And yet still we do observe high CPU load for NorthD process; taking > >> deeper into the logs we have found: > >> > >> > >> > > > > Thanks for reporting this issue. > > > > > > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd > >> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU > >> usage) > >> > >> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute > >> (missing handler for input SB_datapath_binding) took 52313ms* > >> > >> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute > >> (failed handler for input northd) took 7759ms* > >> > >> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms > >> poll interval (56201ms user, 2900ms system)* > >> > >> > >> > >> As you can see there is a significant delay in 52 secs > >> > > This is huge indeed! > > >> Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’ > >> – practically means absence of the inc-engine handler from some node (in > >> this sample: *SB_datapath_binding*) > >> > > > > That's correct. > > > > Before plunging into Development it would be great to clarify/adjust with > >> Community’s position > >> > >>- Why there is not handler for this node? > >> > >> > > Our approach has been to add a handler for any input change only if it is > > frequent or if it can be easily handled. > > We also have skipped adding handlers if it increases the code complexity. > > Having said that I think we are open > > to adding more handlers if it makes sense or if it results in scale > > improvements. > > > > Right now we fall back to a full recompute of northd engine for any changes > > to a logical switch or logical router. > > Does your deployment create/delete logical switches/routers frequently ? > > Is it possible to enable ovn debug logs > > and share them ? I'm curious to know what are the changes to SB datapath > > binding. > > > > Feel free to share your OVN NB and SB DBs if you're ok with it. I can > > deploy those DBs and see why recompute is so expensive. > > > > > > > >>- Any particular reason for this or just the peculiarity of our > >>installation highlighted this issue? > >> > >> > > My guess is that your installation is frequently creating , deleting or > > modifying logical switches or routers. > > > > > >>- > >>- Do you think there is a reason in implementing that handler? ( > >>*SB_datapath_binding*) > >> > >> > > I'm fine adding a handler if it helps in the scale. In our use cases, we > > don't frequently create/delete the logical switches and routers > > and hence it is ok to fall back to full recomputes for such changes. > > > > > >>- > >> > >> > >> > >> Any ideas are highly appreciated. > >> > > > > You're welcome to work on it and submit patches to add a handler for > > SB_datapath_binding. > > > > @Dumitru Ceara @Han Zhou if you've any > > reservations on adding more handlers please do comment here. > > > > In general, especially if it fixes a scalability issue like this one, > it's probably fine. In practice it depends a bit on how much complexity > this would add to the code. > I agree with the general statement. > But the best way to tell is to have a way to reproduce this, e.g., NB/SB > databases and the NB/SB jsonrpc update that caused the recompute. > Yes, it is better to understand why in this deployment the recompute took so long (52s). Is it simply too large scale, or is it because of some uncommon configuration that we don't handle efficiently and can be optimized to improve recompute performance. Otherwise, even if we can implement datapath I-P, there can be just another input change that triggers recompute and causes the same latency. It is just not sustainable to maintain more and more I-P in northd. > Regards, > Dumitru > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
On 5/8/24 18:01, Numan Siddique wrote: > On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss < > ovs-discuss@openvswitch.org> wrote: > >> Hello everyone >> >> >> >> In some aspect it might be considered as a continuation of this thread: >> (link1), yet it is different >> >> After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found >> increase in performance in 3-4 times >> >> And yet still we do observe high CPU load for NorthD process; taking >> deeper into the logs we have found: >> >> >> > > Thanks for reporting this issue. > > > 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd >> 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU >> usage) >> >> *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute >> (missing handler for input SB_datapath_binding) took 52313ms* >> >> *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute >> (failed handler for input northd) took 7759ms* >> >> *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms >> poll interval (56201ms user, 2900ms system)* >> >> >> >> As you can see there is a significant delay in 52 secs >> This is huge indeed! >> Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’ >> – practically means absence of the inc-engine handler from some node (in >> this sample: *SB_datapath_binding*) >> > > That's correct. > > Before plunging into Development it would be great to clarify/adjust with >> Community’s position >> >>- Why there is not handler for this node? >> >> > Our approach has been to add a handler for any input change only if it is > frequent or if it can be easily handled. > We also have skipped adding handlers if it increases the code complexity. > Having said that I think we are open > to adding more handlers if it makes sense or if it results in scale > improvements. > > Right now we fall back to a full recompute of northd engine for any changes > to a logical switch or logical router. > Does your deployment create/delete logical switches/routers frequently ? > Is it possible to enable ovn debug logs > and share them ? I'm curious to know what are the changes to SB datapath > binding. > > Feel free to share your OVN NB and SB DBs if you're ok with it. I can > deploy those DBs and see why recompute is so expensive. > > > >>- Any particular reason for this or just the peculiarity of our >>installation highlighted this issue? >> >> > My guess is that your installation is frequently creating , deleting or > modifying logical switches or routers. > > >>- >>- Do you think there is a reason in implementing that handler? ( >>*SB_datapath_binding*) >> >> > I'm fine adding a handler if it helps in the scale. In our use cases, we > don't frequently create/delete the logical switches and routers > and hence it is ok to fall back to full recomputes for such changes. > > >>- >> >> >> >> Any ideas are highly appreciated. >> > > You're welcome to work on it and submit patches to add a handler for > SB_datapath_binding. > > @Dumitru Ceara @Han Zhou if you've any > reservations on adding more handlers please do comment here. > In general, especially if it fixes a scalability issue like this one, it's probably fine. In practice it depends a bit on how much complexity this would add to the code. But the best way to tell is to have a way to reproduce this, e.g., NB/SB databases and the NB/SB jsonrpc update that caused the recompute. Regards, Dumitru ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
On Wed, May 8, 2024 at 8:42 AM Шагов Георгий via discuss < ovs-discuss@openvswitch.org> wrote: > Hello everyone > > > > In some aspect it might be considered as a continuation of this thread: > (link1), yet it is different > > After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found > increase in performance in 3-4 times > > And yet still we do observe high CPU load for NorthD process; taking > deeper into the logs we have found: > > > Thanks for reporting this issue. 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd > 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU > usage) > > *2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute > (missing handler for input SB_datapath_binding) took 52313ms* > > *2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute > (failed handler for input northd) took 7759ms* > > *2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms > poll interval (56201ms user, 2900ms system)* > > > > As you can see there is a significant delay in 52 secs > > Correct me please, if I am in the wrong, but IMU: ‘*missing handler for*’ > – practically means absence of the inc-engine handler from some node (in > this sample: *SB_datapath_binding*) > That's correct. Before plunging into Development it would be great to clarify/adjust with > Community’s position > >- Why there is not handler for this node? > > Our approach has been to add a handler for any input change only if it is frequent or if it can be easily handled. We also have skipped adding handlers if it increases the code complexity. Having said that I think we are open to adding more handlers if it makes sense or if it results in scale improvements. Right now we fall back to a full recompute of northd engine for any changes to a logical switch or logical router. Does your deployment create/delete logical switches/routers frequently ? Is it possible to enable ovn debug logs and share them ? I'm curious to know what are the changes to SB datapath binding. Feel free to share your OVN NB and SB DBs if you're ok with it. I can deploy those DBs and see why recompute is so expensive. >- Any particular reason for this or just the peculiarity of our >installation highlighted this issue? > > My guess is that your installation is frequently creating , deleting or modifying logical switches or routers. >- >- Do you think there is a reason in implementing that handler? ( >*SB_datapath_binding*) > > I'm fine adding a handler if it helps in the scale. In our use cases, we don't frequently create/delete the logical switches and routers and hence it is ok to fall back to full recomputes for such changes. >- > > > > Any ideas are highly appreciated. > You're welcome to work on it and submit patches to add a handler for SB_datapath_binding. @Dumitru Ceara @Han Zhou if you've any reservations on adding more handlers please do comment here. Thanks Numan > > Yours truly, George > > > > > >- Link1: >https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053035.html > > УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые > документы, приложенные к нему, содержат конфиденциальную информацию. > Настоящим уведомляем Вас о том, что если это сообщение не предназначено > Вам, использование, копирование, распространение информации, содержащейся в > настоящем сообщении, а также осуществление любых действий на основе этой > информации, строго запрещено. Если Вы получили это сообщение по ошибке, > пожалуйста, сообщите об этом отправителю по электронной почте и удалите это > сообщение. > CONFIDENTIALITY NOTICE: This email and any files attached to it are > confidential. If you are not the intended recipient you are notified that > using, copying, distributing or taking any action in reliance on the > contents of this information is strictly prohibited. If you have received > this email in error please notify the sender and delete this email. > ___ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] NorthD inc-engine Handlers; OVN 24.03
Hello everyone In some aspect it might be considered as a continuation of this thread: (link1), yet it is different After we have upgrade from OVN 22.03 to OVN 24.03, we have indeed found increase in performance in 3-4 times And yet still we do observe high CPU load for NorthD process; taking deeper into the logs we have found: 2024-05-07T08:36:46.505Z|18503|poll_loop|INFO|wakeup due to [POLLIN] on fd 15 (10.34.22.66:60716<->10.34.22.66:6642) at lib/stream-fd.c:157 (94% CPU usage) 2024-05-07T08:37:38.857Z|18504|inc_proc_eng|INFO|node: northd, recompute (missing handler for input SB_datapath_binding) took 52313ms 2024-05-07T08:37:48.335Z|18505|inc_proc_eng|INFO|node: lflow, recompute (failed handler for input northd) took 7759ms 2024-05-07T08:37:48.718Z|18506|timeval|WARN|Unreasonably long 62213ms poll interval (56201ms user, 2900ms system) As you can see there is a significant delay in 52 secs Correct me please, if I am in the wrong, but IMU: ‘missing handler for’ – practically means absence of the inc-engine handler from some node (in this sample: SB_datapath_binding) Before plunging into Development it would be great to clarify/adjust with Community’s position * Why there is not handler for this node? * Any particular reason for this or just the peculiarity of our installation highlighted this issue? * Do you think there is a reason in implementing that handler? (SB_datapath_binding) Any ideas are highly appreciated. Yours truly, George * Link1: https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/053035.html УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение. CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss