Re: [DISCUSSION] Framework for SQL performance regressions detection.

Roman Kondakov Sat, 23 May 2020 06:14:31 -0700

Hi Igor,

I think yes, this tool will be well suited for this.



-- 
Kind Regards
Roman Kondakov


On 22.05.2020 22:46, Igor Seliverstov wrote:
> Great idea I think.
> 
> Can we also use the tool to compare, let's say, H2 indexing against Calcite
> based one to detect possible issues when the new engine acts worse than H2?
> 
> Regards
> Igor
> 
> пт, 22 мая 2020 г., 22:36 Denis Magda <dma...@apache.org>:
> 
>> Hi Roman,
>>
>> +1 for sure. On a side note, should we create a separate ASF/Git repository
>> for the project? Not sure we need to put the suite in the main Ignite repo.
>>
>> -
>> Denis
>>
>>
>> On Fri, May 22, 2020 at 8:54 AM Roman Kondakov <kondako...@mail.ru.invalid
>>>
>> wrote:
>>
>>> Hi everybody!
>>>
>>> Currently Ignite doesn't have an ability to detect SQL performance
>>> regressions between different versions. We have a Yardstick benchmark
>>> module, but it has several drawbacks:
>>> - it doesn't compare different Ignite versions
>>> - it doesn't check the query result
>>> - it doesn't have an ability to execute randomized SQL queries (aka
>>> fuzzy testing)
>>>
>>> So, Yardstick is not very helpful for detecting SQL performance
>>> regressions.
>>>
>>> I think we need a brand-new framework for this task and I propose to
>>> implement it by adopting the ideas taken from the Apollo tool paper [1].
>>> The Apollo tool pipeline works like like this:
>>>
>>> 1. Apollo start two different versions of databases simultaneously.
>>> 2. Then Apollo populates them with the same dataset
>>> 3. Apollo generates random SQL queries using external library (i.e.
>>> SQLSmith [2])
>>> 4. Each query is executed in both database versions. Execution time is
>>> measured by the framework.
>>> 5. If the execution time difference for the same query exceeds some
>>> threshold (say, 2x slower), the query is logged.
>>> 6. Apollo then tries to simplify the problematic queries in order to
>>> obtain the minimal reproducer.
>>> 7. Apollo also has an ability to automatically perform git history
>>> binary search to find the bad commit
>>> 8. It also can localize a root cause of the regression by carrying out
>>> the statistical debugging.
>>>
>>> I think we don't have to implement all these Apollo steps. First 4 steps
>>> will be enough for our needs.
>>>
>>> My proposal is to create a new module called 'sql-testing'. We need a
>>> separate module because it should be suitable for both query engines:
>>> H2-based and upcoming Calcite-based. This module will contain a test
>>> suite which works in the following way:
>>> 1. It starts two Ignite clusters with different versions (current
>>> version and the previous release version).
>>> 2. Framework then runs randomly generated queries in both clusters and
>>> checks the execution time for each cluster. We need to port SQLSmith [2]
>>> library from C++ to java for this step. But initially we can start with
>>> some set of hardcoded queries and postpone the SQLSmith port. Randomized
>>> queries can be added later.
>>> 3. All problematic queries are then reported as performance issues. In
>>> this way we can manually examine the problems.
>>>
>>> This tool will bring a certain amount of robustness to our SQL layer as
>>> well as some portion of confidence in absence of SQL query regressions.
>>>
>>> What do you think?
>>>
>>>
>>> [1] http://www.vldb.org/pvldb/vol13/p57-jung.pdf
>>> [2] https://github.com/anse1/sqlsmith
>>>
>>>
>>> --
>>> Kind Regards
>>> Roman Kondakov
>>>
>>>
>>
>

Re: [DISCUSSION] Framework for SQL performance regressions detection.

Reply via email to