Re: SQL query CPU utilization too low.

Andrey Mashenkov Wed, 30 Nov 2016 07:24:28 -0800

Serj,  you can see a PR attached to jira issue [1], that can be opened with
upsource [2].


Tanks, I remember about distributed queries and wiil rework them right
after we come to agreemant that the solution for simple queries is ok.

[1] https://issues.apache.org/jira/browse/IGNITE-4106
[2] http://reviews.ignite.apache.org/ignite/review/IGNT-CR-15



On Wed, Nov 30, 2016 at 5:34 PM, Sergi Vladykin <[email protected]>
wrote:

> Per cache SQL parallelism level looks reasonable to me here.
>
> I'm not sure what do you mean about "prepared statement cache is useless
> with splitted indices", most probably you parallelize queries in some wrong
> way if this is true.
>
> Also do not forget about distributed joins: with parallel queries on the
> same node we will need to make index range requests not only to remote
> nodes, but to query contexts in parallel threads on the same local node as
> well.
>
> Sergi
>
> 2016-11-30 17:23 GMT+03:00 Andrey Mashenkov <[email protected]>:
>
> > It looks like we can't just split sql query to several threads due to H2
> > limitations.
> > We can bound query thread with certain set of partitions, but, actually,
> H2
> > will read whole index and then filter entries regarding its partition.
> So,
> > we can get significant speed-up that way.
> >
> > Unfortunatelly, H2 does not support sharding, and we need to have a
> > workaround. We can try to split indices, so each query thread would be
> > bounded with its own index part.
> > I've implemented such prototype and get significant speed up with single
> > node grid as if it was several node grid.
> > Due to H2 knows nothing about splitted indices, we must bother about
> every
> > query should be run as TwoStepQuery and utilize all table index parts.
> >
> > As index creation on demand is very heavy operation, index should be
> > splitted when it is created. So we can set parallelizm level on per-cache
> > base but not per-query.
> >
> > Another issue I've faced is that our implementation of prepared statement
> > cache is useless with splitted indices. Prepared statement cached  in
> > thread local variable and it seems that the statement is bounded with
> > certain index part. So if we reuse same statement for different index
> parts
> > we will get unexpected results.
> >
> > On Sun, Oct 30, 2016 at 8:46 PM, Dmitriy Setrakyan <
> [email protected]>
> > wrote:
> >
> > > Completely agree, great point!
> > >
> > > On Sun, Oct 30, 2016 at 9:17 AM, Sergi Vladykin <
> > [email protected]>
> > > wrote:
> > >
> > > > I think it must be a maximum local parallelism level but not just
> `on`
> > > and
> > > > `off` setting (the default is obviously 1). This along with
> separately
> > > > configurable query thread pool will give a finer grained control over
> > > > resources.
> > > >
> > > > Sergi
> > > >
> > > > 2016-10-30 18:22 GMT+03:00 Dmitriy Setrakyan <[email protected]
> >:
> > > >
> > > > > I already mentioned this in another email, but we should be able to
> > > turn
> > > > > this property on and off on per-query and per-cache levels.
> > > > >
> > > > > On Sat, Oct 29, 2016 at 11:45 AM, Sergi Vladykin <
> > > > [email protected]
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Agree, lets implement such a parallelization.
> > > > > >
> > > > > > I think we will need an explicit setting for SqlQuery and
> > > > SqlFieldsQuery,
> > > > > > the default behavior should not change.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > > > 2016-10-28 22:39 GMT+03:00 Andrey Mashenkov <
> > [email protected]
> > > >:
> > > > > >
> > > > > > > So, now we have every SQL query run on each node in single
> > thread.
> > > > This
> > > > > > can
> > > > > > > be an issue for heavy queries or queries running on big data
> > sets,
> > > > e.g.
> > > > > > > analytical queries.
> > > > > > >
> > > > > > > For now, the only way to speed up such queries is to add more
> > nodes
> > > > to
> > > > > > grid
> > > > > > > running on same server. In this case, data will be partitioned
> > over
> > > > all
> > > > > > > these nodes and query will be split and run on all nodes.
> > > > > > >
> > > > > > > It seems, we can have a benefit if split SQL queries locally as
> > we
> > > do
> > > > > it
> > > > > > > across nodes with TwoStepQuery.
> > > > > > >
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > С уважением,
> > Машенков Андрей Владимирович
> > Тел. +7-921-932-61-82
> >
> > Best regards,
> > Andrey V. Mashenkov
> > Cerr: +7-921-932-61-82
> >
>



-- 
С уважением,
Машенков Андрей Владимирович
Тел. +7-921-932-61-82

Best regards,
Andrey V. Mashenkov
Cerr: +7-921-932-61-82

Re: SQL query CPU utilization too low.

Reply via email to