It looks like we can't just split sql query to several threads due to H2 limitations. We can bound query thread with certain set of partitions, but, actually, H2 will read whole index and then filter entries regarding its partition. So, we can get significant speed-up that way.
Unfortunatelly, H2 does not support sharding, and we need to have a workaround. We can try to split indices, so each query thread would be bounded with its own index part. I've implemented such prototype and get significant speed up with single node grid as if it was several node grid. Due to H2 knows nothing about splitted indices, we must bother about every query should be run as TwoStepQuery and utilize all table index parts. As index creation on demand is very heavy operation, index should be splitted when it is created. So we can set parallelizm level on per-cache base but not per-query. Another issue I've faced is that our implementation of prepared statement cache is useless with splitted indices. Prepared statement cached in thread local variable and it seems that the statement is bounded with certain index part. So if we reuse same statement for different index parts we will get unexpected results. On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > Completely agree, great point! > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <sergi.vlady...@gmail.com> > wrote: > > > I think it must be a maximum local parallelism level but not just `on` > and > > `off` setting (the default is obviously 1). This along with separately > > configurable query thread pool will give a finer grained control over > > resources. > > > > Sergi > > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>: > > > > > I already mentioned this in another email, but we should be able to > turn > > > this property on and off on per-query and per-cache levels. > > > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin < > > sergi.vlady...@gmail.com > > > > > > > wrote: > > > > > > > Agree, lets implement such a parallelization. > > > > > > > > I think we will need an explicit setting for SqlQuery and > > SqlFieldsQuery, > > > > the default behavior should not change. > > > > > > > > Sergi > > > > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <amashen...@gridgain.com > >: > > > > > > > > > So, now we have every SQL query run on each node in single thread. > > This > > > > can > > > > > be an issue for heavy queries or queries running on big data sets, > > e.g. > > > > > analytical queries. > > > > > > > > > > For now, the only way to speed up such queries is to add more nodes > > to > > > > grid > > > > > running on same server. In this case, data will be partitioned over > > all > > > > > these nodes and query will be split and run on all nodes. > > > > > > > > > > It seems, we can have a benefit if split SQL queries locally as we > do > > > it > > > > > across nodes with TwoStepQuery. > > > > > > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > -- С уважением, Машенков Андрей Владимирович Тел. +7-921-932-61-82 Best regards, Andrey V. Mashenkov Cerr: +7-921-932-61-82