Hi Denis, I'm not sure we need a separate repository for it. What would be the benefit of using a separate repo?
BTW I noticed that Ignite has `ignite-benchmarks` module. It contains JMH/JOL benchmarks for now. We can also put the SQL benchmark to this module. What do you think? -- Kind Regards Roman Kondakov On 22.05.2020 22:36, Denis Magda wrote: > Hi Roman, > > +1 for sure. On a side note, should we create a separate ASF/Git repository > for the project? Not sure we need to put the suite in the main Ignite repo. > > - > Denis > > > On Fri, May 22, 2020 at 8:54 AM Roman Kondakov <kondako...@mail.ru.invalid> > wrote: > >> Hi everybody! >> >> Currently Ignite doesn't have an ability to detect SQL performance >> regressions between different versions. We have a Yardstick benchmark >> module, but it has several drawbacks: >> - it doesn't compare different Ignite versions >> - it doesn't check the query result >> - it doesn't have an ability to execute randomized SQL queries (aka >> fuzzy testing) >> >> So, Yardstick is not very helpful for detecting SQL performance >> regressions. >> >> I think we need a brand-new framework for this task and I propose to >> implement it by adopting the ideas taken from the Apollo tool paper [1]. >> The Apollo tool pipeline works like like this: >> >> 1. Apollo start two different versions of databases simultaneously. >> 2. Then Apollo populates them with the same dataset >> 3. Apollo generates random SQL queries using external library (i.e. >> SQLSmith [2]) >> 4. Each query is executed in both database versions. Execution time is >> measured by the framework. >> 5. If the execution time difference for the same query exceeds some >> threshold (say, 2x slower), the query is logged. >> 6. Apollo then tries to simplify the problematic queries in order to >> obtain the minimal reproducer. >> 7. Apollo also has an ability to automatically perform git history >> binary search to find the bad commit >> 8. It also can localize a root cause of the regression by carrying out >> the statistical debugging. >> >> I think we don't have to implement all these Apollo steps. First 4 steps >> will be enough for our needs. >> >> My proposal is to create a new module called 'sql-testing'. We need a >> separate module because it should be suitable for both query engines: >> H2-based and upcoming Calcite-based. This module will contain a test >> suite which works in the following way: >> 1. It starts two Ignite clusters with different versions (current >> version and the previous release version). >> 2. Framework then runs randomly generated queries in both clusters and >> checks the execution time for each cluster. We need to port SQLSmith [2] >> library from C++ to java for this step. But initially we can start with >> some set of hardcoded queries and postpone the SQLSmith port. Randomized >> queries can be added later. >> 3. All problematic queries are then reported as performance issues. In >> this way we can manually examine the problems. >> >> This tool will bring a certain amount of robustness to our SQL layer as >> well as some portion of confidence in absence of SQL query regressions. >> >> What do you think? >> >> >> [1] http://www.vldb.org/pvldb/vol13/p57-jung.pdf >> [2] https://github.com/anse1/sqlsmith >> >> >> -- >> Kind Regards >> Roman Kondakov >> >> >