Hello, Igniters.

We discussed this feature privately with Alexander and Nikita.
Here are the results we want to share with the community:

0. In the end, both, performance statistic tool and tracing should use the same 
API.
1. We should improve the Tracing API, so it able to be used for gathering 
information about all operations without a significant performance drop.

I propose to go as follows:

1. Merge current PR as is after final review. My intention is to provide a tool 
for users that can be used in the real-world production environment.
2. Improve the Tracing API.
3. Combine both tools under the same API.

> 14 дек. 2020 г., в 10:42, Alexander Lapin <lapin1...@gmail.com> написал(а):
> 
> Hello Igniters,
> 
> Because the tracing causes performance drop 52% [4] and can not be
>> used for collecting statistics about all queries in production
>> deployments. The performance drop of the profiling tool is less than
>> 2% and it can be used in production. I have benchmarked the tracing
>> and got the results:
>> 
>> -2% when configured OpenCensusTracingSpi and all scopes disabled
>> -52% for TX scope (IgnitePutTxBenchmark)
>> -58% for SQL scope  (IgniteSqlQueryBenchmark)
>> 
>> Such a performance drop is significant to not use the tracing in
>> production.
>> 
> We've rerun tracing benchmarks based on more realistic scenarios and got a
> 10-15% performance drop in case of sampling-rate 1 (all transactions were
> traced). More realistic scenarios means that we don't test tracing
> performance if the system is in overdraft state but add some sort of micro
> throttling (1 millisecond) between operations, transactions in our case.
> *IgnitePutTxBenchmark*
> 
> Green: Case 1: NoopTracingSpi
> 
> Blue: Case 2: OpenCensusTracingSpi (disabled)
> 
> Red: Case 3: OpenCensusTracingSpi, --scope TX --sampling-rate 0.1
> 
> Black: Case 5: *ControlCenter* + OpenCensusTracingSpi, --scope TX
> --sampling-rate 0.1
> 
> Violet: Case 4: OpenCensusTracingSpi, --scope TX --sampling-rate 1
> Yellow: Case 6: ControlCenter + OpenCensusTracingSpi, --scope TX
> --sampling-rate
> 
> I have considered the possibility to reuse the tracing API. If
>> statistics collecting will be implemented with the TracingSpi then we
>> get a performance drop due to:
>> - Transferring tracing context over the network.
>> - Using ThreadLocal for spans
>> - Converting primitives and objects to string and vice versa. (API
>> supports only String-based tags and values)
>> - Generating span objects
>> 
> @Nikita Amelchev Could you please share numbers?
> 
> Best regards,
> Alexander
> 
> пн, 7 дек. 2020 г. в 17:24, Nikolay Izhikov <nizhi...@apache.org>:
> 
>> Hello, Nikita.
>> 
>> Makes sense.
>> 
>> I will take a look.
>> 
>>> 7 дек. 2020 г., в 15:29, Nikita Amelchev <nsamelc...@gmail.com>
>> написал(а):
>>> 
>>> Hello, Igniters.
>>> 
>>> I have implemented the profiling tool [1, 2]. It writes duration and
>>> other parameters of user operations (scan, SQL query, transactions,
>>> tasks, jobs, CQ, etc) to a local file. This info can be used in
>>> various cases. The main goal is to build the performance report to
>>> analyze the count and duration of user queries [3].
>>> 
>>> We already have the tracing with similar functionality but I think
>>> Ignite should have both tools - tracing and profiling.
>>> 
>>> Because the tracing causes performance drop 52% [4] and can not be
>>> used for collecting statistics about all queries in production
>>> deployments. The performance drop of the profiling tool is less than
>>> 2% and it can be used in production. I have benchmarked the tracing
>>> and got the results:
>>> 
>>> -2% when configured OpenCensusTracingSpi and all scopes disabled
>>> -52% for TX scope (IgnitePutTxBenchmark)
>>> -58% for SQL scope  (IgniteSqlQueryBenchmark)
>>> 
>>> Such a performance drop is significant to not use the tracing in
>> production.
>>> 
>>> I have considered the possibility to reuse the tracing API. If
>>> statistics collecting will be implemented with the TracingSpi then we
>>> get a performance drop due to:
>>> - Transferring tracing context over the network.
>>> - Using ThreadLocal for spans
>>> - Converting primitives and objects to string and vice versa. (API
>>> supports only String-based tags and values)
>>> - Generating span objects
>>> 
>>> I have benchmarked implementations on the yardstick’s
>>> IgniteGetBenchmark. The tracing context transferring over the network
>>> was disabled. The results:
>>> - Tracing API implementation - 8% performance drop.
>>> - Proposed implementation - 2% performance drop.
>>> 
>>> I think this is a significant drop and implementation with reuse
>>> tracing API should not be used. The cluster profiling should have as
>>> little performance drop as possible to be used in production. The
>>> tracing will be used for the detailed investigation.
>>> 
>>> WDYT?
>>> 
>>> The tool is ready to be reviewed [3, 5].
>>> 
>>> [1] https://issues.apache.org/jira/browse/IGNITE-12666
>>> [2]
>> https://cwiki.apache.org/confluence/display/IGNITE/Cluster+performance+profiling+tool
>>> [3] https://github.com/apache/ignite-extensions/pull/16
>>> [4]
>> https://issues.apache.org/jira/secure/attachment/13016636/Tracing%20benchmarks.docx
>>> [5] https://github.com/apache/ignite/pull/7693
>>> 
>>> ср, 24 июн. 2020 г. в 23:31, Saikat Maitra <saikat.mai...@gmail.com>:
>>>> 
>>>> Hi Nikita,
>>>> 
>>>> The changes in this PR looks good.
>>>> 
>>>> https://github.com/apache/ignite-extensions/pull/16
>>>> 
>>>> Regards,
>>>> Saikat
>>>> 
>>>> On Mon, Jun 22, 2020 at 12:03 PM Nikolay Izhikov <nizhi...@apache.org>
>>>> wrote:
>>>> 
>>>>> Hello, Igniters.
>>>>> 
>>>>> I think that inside Ignite core we should name this feature as
>>>>> «performance statistics»
>>>>> We already have «cache statistics».
>>>>> Data that is collected by performance statistics can be used not only
>> for
>>>>> profiling but to solve other tasks.
>>>>> 
>>>>> 
>>>>>> 22 июня 2020 г., в 14:00, Nikita Amelchev <nsamelc...@gmail.com>
>>>>> написал(а):
>>>>>> 
>>>>>> Hi, guys.
>>>>>> 
>>>>>> I have mentioned components under the MIT license in the LICENSE file.
>>>>>> 
>>>>>> Saikat, I have fixed PR according to your suggestions. Thanks for
>> taking
>>>>> a look.
>>>>> 
>>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best wishes,
>>> Amelchev Nikita
>> 
>> 

Reply via email to