Re: Query history statistics API

Denis Magda Mon, 24 Dec 2018 11:03:43 -0800

Yuri, Vladimir,

How expensive will it be to have the history enabled by default? For
instance, let's take the running queries mechanics as an example, what is
the performance impact the did?


As for the running queries, Yuri, if it's completed please send a summary
to the relevant discussion explaining how it will be used by the end user.
Plus, it might be a right time to restart our KILL command conversation, we
haven't come to an agreement in regards the syntax.

--
Denis


On Fri, Dec 21, 2018 at 8:34 AM Юрий <[email protected]> wrote:

> Vladimir, thanks for your expert opinion.
>
> I have some thoughts about 5 point.
> I tried to find how it works for Oracle and PG:
>
> *PG*: keep by default 1000 (can be configured) statements without and
> discard the least-executed statements. Update statistics is asynchronous
> process and statistics may have lag.
>
> *Oracle*: use shared pool for historical data and can evict records with
> min time of last execution in case free space at shared pool is not enough
> for a data which can be related not only historical statistics. So seems
> also separate asynchronous process (information about it so small).
>
>
> Unfortunately I could not find information about big workload and how it
> handled for these databases. However We could see that both of vendors use
> asynchronous statistic processing.
>
>
> I see few variants how we can handle very high workload.
>
> First part of variants use asynchronous model with separate thread which
> should take elements to update stats from a queue:
> 1) We blocking on overlimited queue and wait when capacity will be enough
> to put new element.
>
> + We have all actual statistics
> - End of our query execution can be blocked.
>
> 2) Discard statistics for ended query in case queue is full.
>
> + Very fast for current query
> - We lose part of statistics.
>
> 3) Do full clean of statistic's queue.
>
> + Fast and freespace for further elements
> - We lose big number of statistic elements.
>
>
> Second part of variants use current approach for queryMetrics. When we have
> some additional capacity for CHM with history + periodical cleanup the Map.
> In case even the additional space is not enough we can :
> 1) Discard statistics for ended query
> 2) Do full clean CHM and discard all gathered information.
>
> First part of variants potentially should work faster due to we can update
> history Map in single thread without contention and put to queue should be
> faster.
>
>
> What do you think? Which of the variant will be prefer or may be you can
> suggest another way to handle potential huge workload?
>
> Also there is one initial question which stay not clear to me - it is right
> place for new API.
>
>
> пт, 21 дек. 2018 г. в 13:05, Vladimir Ozerov <[email protected]>:
>
> > Hi,
> >
> > I'd propose the following approach:
> > 1) Enable history by default. Becuase otherwise users will have to
> restart
> > the node to enable it, or we will have to implement dynamic history
> enable,
> > which is complex thing. Default value should be relatively small yet
> > allowing to accommodate typical workloads. E.g. 1000 entries. This should
> > not put any serious pressure to GC.
> > 2) Split queries by: schema, query, local flag
> > 3) Track only growing values: execution count, error count, minimum
> > duration, maximum duration
> > 4) Implement ability to clear history - JMX, SQL command, whatever (may
> be
> > this is different ticket)
> > 5) History cleanup might be implemented similarly to current approach:
> > store everything in CHM. Periodically check it's size. If it is too big -
> > evict oldest entries. But this should be done with care - under some
> > workloads new queries will be generated very quickly. In this case we
> > should either fallback to synchronous evicts, or do not log history at
> all.
> >
> > Thoughts?
> >
> > Vladimir.
> > -
> >
> > On Fri, Dec 21, 2018 at 11:22 AM Юрий <[email protected]>
> wrote:
> >
> > > Alexey,
> > >
> > > Yes, such property to configuration history size will be added. I think
> > > default value should be 0 and history by default shouldn't be gather at
> > > all, and can be switched on by property in case when it required.
> > >
> > > Currently I planned use the same way to evicting old data as for
> > > queryMetrics - scheduled task will evict will old data by oldest start
> > time
> > > of query.
> > >
> > > Will be gathered statistics for only initial clients queries, so
> internal
> > > queries will not including. For the same queries we will have one
> record
> > in
> > > history with merged statistics.
> > >
> > > All above points just my proposal. Please revert back in case you think
> > > anything should be implemented by another way.
> > >
> > >
> > >
> > >
> > >
> > > чт, 20 дек. 2018 г. в 18:23, Alexey Kuznetsov <[email protected]>:
> > >
> > > > Yuriy,
> > > >
> > > > I have several questions:
> > > >
> > > > Are we going to add some properties to cluster configuration for
> > history
> > > > size?
> > > >
> > > > And what will be default history size?
> > > >
> > > > Will the same queries count as same item of historical data?
> > > >
> > > > How we will evict old data that not fit into history?
> > > >
> > > > Will we somehow count "reduce" queries? Or only final "map" ones?
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > >
> > >
> > >
> > > --
> > > Живи с улыбкой! :D
> > >
> >
>
>
> --
> Живи с улыбкой! :D
>

Re: Query history statistics API

Reply via email to