Re: Extended logging for rebalance performance analysis

Ivan Rakov Mon, 29 Jun 2020 01:52:04 -0700

+1 to Alex G.

>From my experience, the most interesting cases with Ignite rebalancing
happen exactly in production. According to the fact that we already have
detailed rebalancing logging, adding info about rebalance performance looks
like a reasonable improvement. With new logs we'll be able to detect and
investigate situations when rebalance is slow due to uneven suppliers
distribution or network issues.
Option to disable the feature in runtime shouldn't be used often, but it
will keep us on the safe side in case something goes wrong.
The format described in
https://issues.apache.org/jira/browse/IGNITE-12080 looks
good to me.


On Tue, Jun 23, 2020 at 7:01 PM ткаленко кирилл <tkalkir...@yandex.ru>
wrote:

> Hello, Alexey!
>
> Currently there is no way to disable / enable it, but it seems that the
> logs will not be overloaded, since Alexei Scherbakov offer seems reasonable
> and compact. Of course, you can add disabling / enabling statistics
> collection via jmx for example.
>
> 23.06.2020, 18:47, "Alexey Goncharuk" <alexey.goncha...@gmail.com>:
> > Hello Maxim, folks,
> >
> > ср, 6 мая 2020 г. в 21:01, Maxim Muzafarov <mmu...@apache.org>:
> >
> >>  We won't do performance analysis on the production environment. Each
> >>  time we need performance analysis it will be done on a test
> >>  environment with verbose logging enabled. Thus I suggest moving these
> >>  changes to a separate `profiling` module and extend the logging much
> >>  more without any ышяу limitations. The same as these [2] [3]
> >>  activities do.
> >
> >  I strongly disagree with this statement. I am not sure who is meant here
> > by 'we', but I see a strong momentum in increasing observability tooling
> > that helps people to understand what exactly happens in the production
> > environment [1]. Not everybody can afford two identical environments for
> > testing. We should make sure users have enough information to understand
> > the root cause after the incident happened, and not force them to
> reproduce
> > it, let alone make them add another module to the classpath and restart
> the
> > nodes.
> > I think having this functionality in the core module with the ability to
> > disable/enable it is the right approach. Having the information printed
> to
> > log is ok, having it in an event that can be sent to a monitoring/tracing
> > subsystem is even better.
> >
> > Kirill, can we enable and disable this feature in runtime to avoid the
> very
> > same nodes restart?
> >
> > [1]
> https://www.honeycomb.io/blog/yes-i-test-in-production-and-so-do-you/
>

Re: Extended logging for rebalance performance analysis

Reply via email to