By the way, I am assuming that we are talking about per-query metrics, in
which case we should specify metrics history size, so we don't keep all the
queries in memory forever. I don't think it makes sense to have metrics
aggregated across the queries. Just wanted to clarify this.

On Thu, Mar 2, 2017 at 12:31 PM, Denis Magda <[email protected]> wrote:

> Vovan,
>
> Your metrics make perfect sense to me. However, I see a high demand for
> JOINs based metrics especially from those who give a try to non-collocated
> joins in production  and want to measure them somehow. This is why,
> personally, I prefer to see the metrics below in the top priority list as
> well:
>
> if a query was executed in the collocated or non-collocated mode. Three
> results are valid: collocated, non-collocated, simple query (no joins).
> non-collocated query: size of the data exchanged between the nodes to
> complete a specific join. If there are multiple joins in the query we need
> to provide this metric for every of them.
> non-collocated and collocated query: a part of the time spent joining the
> data. If there are multiple joins in the query we need to provide this
> metric for every of them.
>
> As for “unicast” and “broadcast”, agree, let’s ignore it for now.
>
> In any case, can we include timing information (map phase, reduce phase,
> join phase) into an execution plan produced by H2? Are there any
> implementation hooks?
>
> —
> Denis
>
>
> > On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan <[email protected]>
> wrote:
> >
> > I think some of the metrics specified by Denis also make sense, so I
> would
> > add them as well. See below...
> >
> > On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov <[email protected]
> <mailto:[email protected]>>
> > wrote:
> >
> >> Denis,
> >>
> >> Query execution is complex process involving different stages which are
> not
> >> very easy to match with each other. Especially provided that any node
> can
> >> leave topology at any time. Another problem is that engine evolves and
> >> metrics like "did a query do broadcast or unicast" may easily become
> >> useless at some point, because for example there will be neither
> unicast,
> >> nor broadast, but something different. On the other hand I completely
> agree
> >> that performance monitoring is essential part of any mature DBMS.
> >>
> >> I would start with metrics which are both very basic and easy to
> implement
> >> at the same time. For example we can add fingerprint (hash) to every
> query
> >> which will be used to join "map" and "reduce" parts with each other and
> add
> >> the following basic metrics:
> >> 1) Execution count for particular query
> >> 2) Number of map nodes - min, max, avg
> >>
> >
> > (1) and (2) makes sense
> >
> >
> >> 3) Map step duration (if applicable) - min, max,
> >
> > 4) Reduce step duration (if applicable) - min, max, avg
> >>
> >
> > Not sure if (3) and (4) are needed. I would only add them if they are
> easy
> > to implement.
> >
> > I would also add these:
> >
> > 5) Collocated: yes/no
> > 6) last execution time
> > 7) min/max/average execution duration
> >
> >
> >>
> >> Once done users will be able to get statistics for particular queries.
> >>
> >> Vladimir.
> >>
> >>
> >> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <[email protected]> wrote:
> >>
> >>> BTW,
> >>>
> >>> What if we expose per-query metrics below as a part of EXPLAIN ANALYZE?
> >>> Sergi, is this feasible?
> >>>
> >>> —
> >>> Denis
> >>>
> >>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <[email protected]> wrote:
> >>>>
> >>>> Igniters,
> >>>>
> >>>> Let’s shed more light on SQL query execution internals introducing a
> >> set
> >>> of useful metrics (https://issues.apache.org/jira/browse/IGNITE-4757).
> >>>>
> >>>> Per-query metrics. Total history size is defined by
> >> *CacheConfiguration.
> >>> getQueryDetailMetricsSize*:
> >>>> * if a query was executed in the collocated or non-collocated mode.
> >>> Three results are valid: collocated, non-collocated, simple query (no
> >>> joins).
> >>>> * non-collocated query: size of the data exchanged between the nodes
> to
> >>> complete a join.
> >>>> * non-collocated query: did a query do broadcast or unicast to get
> data
> >>> needed to complete a join.
> >>>> * non-collocated and collocated query: a part of the time spent
> joining
> >>> the data.
> >>>>
> >>>> CacheMetrics:
> >>>> * an average number of executed SQL queries (collocated,
> >> non-collocated,
> >>> simple query (no joins)).
> >>>>
> >>>> Please don’t hesitate do share suggest another metrics or improve
> >>> proposed ones.
> >>>>
> >>>> —
> >>>> Denis
>
>

Reply via email to