Re: Additional SQL metrics

Denis Magda Fri, 03 Mar 2017 11:08:07 -0800

What you’re saying should be default behavior. Plus, we can add a special 
parameter that will gather metrics for queries executed longer that N.


—
Denis

> On Mar 3, 2017, at 11:01 AM, Dmitriy Setrakyan <[email protected]> wrote:
> 
> Hm... as a user I would be interested to know that, say, 95% of my "select
> * from sometable where..." query executes under 10ms or so.
> 
> I think holding some history is important and is not that hard to implement.
> 
> D.
> 
> On Fri, Mar 3, 2017 at 10:55 AM, Denis Magda <[email protected]> wrote:
> 
>> Sergey, agree, good point!
>> 
>> Igniters, any other thoughts before we wrap up the discussion updating the
>> ticket content?
>> 
>> —
>> Denis
>> 
>>> On Mar 3, 2017, at 10:06 AM, Valentin Kulichenko <
>> [email protected]> wrote:
>>> 
>>> Sergey, that's great idea! Generally, user is not interested much in some
>>> average numbers, especially in case of SQL queries. What they need is a
>>> list of slow queries and detailed information about the execution flow of
>>> these particular queries.
>>> 
>>> -Val
>>> 
>>> On Fri, Mar 3, 2017 at 2:50 AM, Sergey Kozlov <[email protected]>
>> wrote:
>>> 
>>>> One more comment:
>>>> 
>>>> In general the customer is interested in slow queries details thus we
>> can
>>>> introduce an option which will allow to store only queries executed more
>>>> than NNN seconds. It may significantly reduce the the memory consumption
>>>> for history (but logging of all queries is still available if set that
>>>> option to 0).
>>>> 
>>>> On Fri, Mar 3, 2017 at 1:00 AM, Denis Magda <[email protected]> wrote:
>>>> 
>>>>> Vovan,
>>>>> 
>>>>> When I’m speaking of JOIN metrics I’m simply assume that we need to add
>>>>> metrics relevant for queries with joins, metrics that will help us get
>>>> more
>>>>> insights on non-collocated and collocated joins execution flow.
>>>>> 
>>>>>> 1) Query exec count
>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg
>>>>> 
>>>>> Total query execution time might not be helpful in the trickiest cases.
>>>>> What if you have multiple joins in your query? How do I know which one
>>>>> contributes to the execution most?
>>>>> 
>>>>> So, I would do a breakdown having total time, map time, per-join time,
>>>>> reduce time. Hope it’s possible. If it’s unclear how to support
>>>> everything
>>>>> at the first place then it’s a different question. Let’s create several
>>>>> tickets and start implementing everything gracefully.
>>>>> 
>>>>>> 3) Number of bytes exchanged between nodes during query execution
>>>>> 
>>>>> It will be really helpful to make a breakdown showing a number of bytes
>>>>> exchanged per-join (physical join). Again, if you believe it makes
>> sense
>>>> to
>>>>> do the breakdown later then let’s create an additional ticket for the 2
>>>>> tier metrics then.
>>>>> 
>>>>>> 4) Number of returned rows - min, max, avg
>>>>> 
>>>>> 
>>>>> Plus, let’s add the following to the list:
>>>>> 
>>>>> 5) Collocated: yes or no
>>>>> 
>>>>> —
>>>>> Denis
>>>>> 
>>>>>> On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>> Denis,
>>>>>> 
>>>>>> The main problem with suggested metrics is that they implies that
>>>> ceratin
>>>>>> internal mechanics work in predefined way. For example, what is JOIN
>>>>>> metrics? There are no guarantees that JOIN in user's query will be
>>>>>> translated to a real physical join. What if several different query
>>>>>> execution pieces happen in parallel? What if we rework our distributed
>>>>>> query engine from pull to push approach for performance reasons and
>>>> there
>>>>>> will be no JOINs in classical sense?
>>>>>> 
>>>>>> This is why I think that we should start with very basic things.
>>>>> Something
>>>>>> like:
>>>>>> 1) Query exec count
>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg
>>>>>> 3) Number of bytes exchanged between nodes during query execution
>>>>>> 4) Number of returned rows - min, max, avg
>>>>>> 
>>>>>> Once we have base numbers in place, we can think of carefully
>>>> integrating
>>>>>> and enhancing all pieces of query execution into more verbose formats,
>>>>>> similar to query plans with relative weights in classical RDBMS
>>>> systems.
>>>>>> 
>>>>>> Thoughts?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> Vovan,
>>>>>>> 
>>>>>>> Your metrics make perfect sense to me. However, I see a high demand
>>>> for
>>>>>>> JOINs based metrics especially from those who give a try to
>>>>> non-collocated
>>>>>>> joins in production  and want to measure them somehow. This is why,
>>>>>>> personally, I prefer to see the metrics below in the top priority
>> list
>>>>> as
>>>>>>> well:
>>>>>>> 
>>>>>>> if a query was executed in the collocated or non-collocated mode.
>>>> Three
>>>>>>> results are valid: collocated, non-collocated, simple query (no
>>>> joins).
>>>>>>> non-collocated query: size of the data exchanged between the nodes to
>>>>>>> complete a specific join. If there are multiple joins in the query we
>>>>> need
>>>>>>> to provide this metric for every of them.
>>>>>>> non-collocated and collocated query: a part of the time spent joining
>>>>> the
>>>>>>> data. If there are multiple joins in the query we need to provide
>> this
>>>>>>> metric for every of them.
>>>>>>> 
>>>>>>> As for “unicast” and “broadcast”, agree, let’s ignore it for now.
>>>>>>> 
>>>>>>> In any case, can we include timing information (map phase, reduce
>>>> phase,
>>>>>>> join phase) into an execution plan produced by H2? Are there any
>>>>>>> implementation hooks?
>>>>>>> 
>>>>>>> —
>>>>>>> Denis
>>>>>>> 
>>>>>>> 
>>>>>>>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I think some of the metrics specified by Denis also make sense, so I
>>>>>>> would
>>>>>>>> add them as well. See below...
>>>>>>>> 
>>>>>>>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov <
>>>> [email protected]
>>>>>>> <mailto:[email protected]>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Denis,
>>>>>>>>> 
>>>>>>>>> Query execution is complex process involving different stages which
>>>>> are
>>>>>>> not
>>>>>>>>> very easy to match with each other. Especially provided that any
>>>> node
>>>>>>> can
>>>>>>>>> leave topology at any time. Another problem is that engine evolves
>>>> and
>>>>>>>>> metrics like "did a query do broadcast or unicast" may easily
>> become
>>>>>>>>> useless at some point, because for example there will be neither
>>>>>>> unicast,
>>>>>>>>> nor broadast, but something different. On the other hand I
>>>> completely
>>>>>>> agree
>>>>>>>>> that performance monitoring is essential part of any mature DBMS.
>>>>>>>>> 
>>>>>>>>> I would start with metrics which are both very basic and easy to
>>>>>>> implement
>>>>>>>>> at the same time. For example we can add fingerprint (hash) to
>> every
>>>>>>> query
>>>>>>>>> which will be used to join "map" and "reduce" parts with each other
>>>>> and
>>>>>>> add
>>>>>>>>> the following basic metrics:
>>>>>>>>> 1) Execution count for particular query
>>>>>>>>> 2) Number of map nodes - min, max, avg
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> (1) and (2) makes sense
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 3) Map step duration (if applicable) - min, max,
>>>>>>>> 
>>>>>>>> 4) Reduce step duration (if applicable) - min, max, avg
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Not sure if (3) and (4) are needed. I would only add them if they
>> are
>>>>>>> easy
>>>>>>>> to implement.
>>>>>>>> 
>>>>>>>> I would also add these:
>>>>>>>> 
>>>>>>>> 5) Collocated: yes/no
>>>>>>>> 6) last execution time
>>>>>>>> 7) min/max/average execution duration
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Once done users will be able to get statistics for particular
>>>> queries.
>>>>>>>>> 
>>>>>>>>> Vladimir.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <[email protected]>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> BTW,
>>>>>>>>>> 
>>>>>>>>>> What if we expose per-query metrics below as a part of EXPLAIN
>>>>> ANALYZE?
>>>>>>>>>> Sergi, is this feasible?
>>>>>>>>>> 
>>>>>>>>>> —
>>>>>>>>>> Denis
>>>>>>>>>> 
>>>>>>>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <[email protected]>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Igniters,
>>>>>>>>>>> 
>>>>>>>>>>> Let’s shed more light on SQL query execution internals
>>>> introducing a
>>>>>>>>> set
>>>>>>>>>> of useful metrics (https://issues.apache.org/
>>>> jira/browse/IGNITE-4757
>>>>> ).
>>>>>>>>>>> 
>>>>>>>>>>> Per-query metrics. Total history size is defined by
>>>>>>>>> *CacheConfiguration.
>>>>>>>>>> getQueryDetailMetricsSize*:
>>>>>>>>>>> * if a query was executed in the collocated or non-collocated
>>>> mode.
>>>>>>>>>> Three results are valid: collocated, non-collocated, simple query
>>>> (no
>>>>>>>>>> joins).
>>>>>>>>>>> * non-collocated query: size of the data exchanged between the
>>>> nodes
>>>>>>> to
>>>>>>>>>> complete a join.
>>>>>>>>>>> * non-collocated query: did a query do broadcast or unicast to
>> get
>>>>>>> data
>>>>>>>>>> needed to complete a join.
>>>>>>>>>>> * non-collocated and collocated query: a part of the time spent
>>>>>>> joining
>>>>>>>>>> the data.
>>>>>>>>>>> 
>>>>>>>>>>> CacheMetrics:
>>>>>>>>>>> * an average number of executed SQL queries (collocated,
>>>>>>>>> non-collocated,
>>>>>>>>>> simple query (no joins)).
>>>>>>>>>>> 
>>>>>>>>>>> Please don’t hesitate do share suggest another metrics or improve
>>>>>>>>>> proposed ones.
>>>>>>>>>>> 
>>>>>>>>>>> —
>>>>>>>>>>> Denis
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Sergey Kozlov
>>>> GridGain Systems
>>>> www.gridgain.com
>>>> 
>> 
>>

Re: Additional SQL metrics

Reply via email to