Re: SQL query CPU utilization too low.

Sergi Vladykin Wed, 30 Nov 2016 06:35:20 -0800

Per cache SQL parallelism level looks reasonable to me here.

I'm not sure what do you mean about "prepared statement cache is useless
with splitted indices", most probably you parallelize queries in some wrong
way if this is true.


Also do not forget about distributed joins: with parallel queries on the
same node we will need to make index range requests not only to remote
nodes, but to query contexts in parallel threads on the same local node as
well.

Sergi

2016-11-30 17:23 GMT+03:00 Andrey Mashenkov <[email protected]>:

> It looks like we can't just split sql query to several threads due to H2
> limitations.
> We can bound query thread with certain set of partitions, but, actually, H2
> will read whole index and then filter entries regarding its partition. So,
> we can get significant speed-up that way.
>
> Unfortunatelly, H2 does not support sharding, and we need to have a
> workaround. We can try to split indices, so each query thread would be
> bounded with its own index part.
> I've implemented such prototype and get significant speed up with single
> node grid as if it was several node grid.
> Due to H2 knows nothing about splitted indices, we must bother about every
> query should be run as TwoStepQuery and utilize all table index parts.
>
> As index creation on demand is very heavy operation, index should be
> splitted when it is created. So we can set parallelizm level on per-cache
> base but not per-query.
>
> Another issue I've faced is that our implementation of prepared statement
> cache is useless with splitted indices. Prepared statement cached  in
> thread local variable and it seems that the statement is bounded with
> certain index part. So if we reuse same statement for different index parts
> we will get unexpected results.
>
> On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <[email protected]>
> wrote:
>
> > Completely agree, great point!
> >
> > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <
> [email protected]>
> > wrote:
> >
> > > I think it must be a maximum local parallelism level but not just `on`
> > and
> > > `off` setting (the default is obviously 1). This along with separately
> > > configurable query thread pool will give a finer grained control over
> > > resources.
> > >
> > > Sergi
> > >
> > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <[email protected]>:
> > >
> > > > I already mentioned this in another email, but we should be able to
> > turn
> > > > this property on and off on per-query and per-cache levels.
> > > >
> > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Agree, lets implement such a parallelization.
> > > > >
> > > > > I think we will need an explicit setting for SqlQuery and
> > > SqlFieldsQuery,
> > > > > the default behavior should not change.
> > > > >
> > > > > Sergi
> > > > >
> > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <
> [email protected]
> > >:
> > > > >
> > > > > > So, now we have every SQL query run on each node in single
> thread.
> > > This
> > > > > can
> > > > > > be an issue for heavy queries or queries running on big data
> sets,
> > > e.g.
> > > > > > analytical queries.
> > > > > >
> > > > > > For now, the only way to speed up such queries is to add more
> nodes
> > > to
> > > > > grid
> > > > > > running on same server. In this case, data will be partitioned
> over
> > > all
> > > > > > these nodes and query will be split and run on all nodes.
> > > > > >
> > > > > > It seems, we can have a benefit if split SQL queries locally as
> we
> > do
> > > > it
> > > > > > across nodes with TwoStepQuery.
> > > > > >
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> С уважением,
> Машенков Андрей Владимирович
> Тел. +7-921-932-61-82
>
> Best regards,
> Andrey V. Mashenkov
> Cerr: +7-921-932-61-82
>

Re: SQL query CPU utilization too low.

Reply via email to