On Fri, Mar 3, 2017 at 11:07 AM, Denis Magda <dma...@apache.org> wrote:
> What you’re saying should be default behavior. Plus, we can add a special > parameter that will gather metrics for queries executed longer that N. > Agree. I thought we already have that in some form. I remember seeing a warning for long queries in the log. > > — > Denis > > > On Mar 3, 2017, at 11:01 AM, Dmitriy Setrakyan <dsetrak...@apache.org> > wrote: > > > > Hm... as a user I would be interested to know that, say, 95% of my > "select > > * from sometable where..." query executes under 10ms or so. > > > > I think holding some history is important and is not that hard to > implement. > > > > D. > > > > On Fri, Mar 3, 2017 at 10:55 AM, Denis Magda <dma...@apache.org> wrote: > > > >> Sergey, agree, good point! > >> > >> Igniters, any other thoughts before we wrap up the discussion updating > the > >> ticket content? > >> > >> — > >> Denis > >> > >>> On Mar 3, 2017, at 10:06 AM, Valentin Kulichenko < > >> valentin.kuliche...@gmail.com> wrote: > >>> > >>> Sergey, that's great idea! Generally, user is not interested much in > some > >>> average numbers, especially in case of SQL queries. What they need is a > >>> list of slow queries and detailed information about the execution flow > of > >>> these particular queries. > >>> > >>> -Val > >>> > >>> On Fri, Mar 3, 2017 at 2:50 AM, Sergey Kozlov <skoz...@gridgain.com> > >> wrote: > >>> > >>>> One more comment: > >>>> > >>>> In general the customer is interested in slow queries details thus we > >> can > >>>> introduce an option which will allow to store only queries executed > more > >>>> than NNN seconds. It may significantly reduce the the memory > consumption > >>>> for history (but logging of all queries is still available if set that > >>>> option to 0). > >>>> > >>>> On Fri, Mar 3, 2017 at 1:00 AM, Denis Magda <dma...@apache.org> > wrote: > >>>> > >>>>> Vovan, > >>>>> > >>>>> When I’m speaking of JOIN metrics I’m simply assume that we need to > add > >>>>> metrics relevant for queries with joins, metrics that will help us > get > >>>> more > >>>>> insights on non-collocated and collocated joins execution flow. > >>>>> > >>>>>> 1) Query exec count > >>>>>> 2) Query exec time (first define what "time" means) - min, max, avg > >>>>> > >>>>> Total query execution time might not be helpful in the trickiest > cases. > >>>>> What if you have multiple joins in your query? How do I know which > one > >>>>> contributes to the execution most? > >>>>> > >>>>> So, I would do a breakdown having total time, map time, per-join > time, > >>>>> reduce time. Hope it’s possible. If it’s unclear how to support > >>>> everything > >>>>> at the first place then it’s a different question. Let’s create > several > >>>>> tickets and start implementing everything gracefully. > >>>>> > >>>>>> 3) Number of bytes exchanged between nodes during query execution > >>>>> > >>>>> It will be really helpful to make a breakdown showing a number of > bytes > >>>>> exchanged per-join (physical join). Again, if you believe it makes > >> sense > >>>> to > >>>>> do the breakdown later then let’s create an additional ticket for > the 2 > >>>>> tier metrics then. > >>>>> > >>>>>> 4) Number of returned rows - min, max, avg > >>>>> > >>>>> > >>>>> Plus, let’s add the following to the list: > >>>>> > >>>>> 5) Collocated: yes or no > >>>>> > >>>>> — > >>>>> Denis > >>>>> > >>>>>> On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <voze...@gridgain.com> > >>>>> wrote: > >>>>>> > >>>>>> Denis, > >>>>>> > >>>>>> The main problem with suggested metrics is that they implies that > >>>> ceratin > >>>>>> internal mechanics work in predefined way. For example, what is JOIN > >>>>>> metrics? There are no guarantees that JOIN in user's query will be > >>>>>> translated to a real physical join. What if several different query > >>>>>> execution pieces happen in parallel? What if we rework our > distributed > >>>>>> query engine from pull to push approach for performance reasons and > >>>> there > >>>>>> will be no JOINs in classical sense? > >>>>>> > >>>>>> This is why I think that we should start with very basic things. > >>>>> Something > >>>>>> like: > >>>>>> 1) Query exec count > >>>>>> 2) Query exec time (first define what "time" means) - min, max, avg > >>>>>> 3) Number of bytes exchanged between nodes during query execution > >>>>>> 4) Number of returned rows - min, max, avg > >>>>>> > >>>>>> Once we have base numbers in place, we can think of carefully > >>>> integrating > >>>>>> and enhancing all pieces of query execution into more verbose > formats, > >>>>>> similar to query plans with relative weights in classical RDBMS > >>>> systems. > >>>>>> > >>>>>> Thoughts? > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <dma...@apache.org> > >>>> wrote: > >>>>>> > >>>>>>> Vovan, > >>>>>>> > >>>>>>> Your metrics make perfect sense to me. However, I see a high demand > >>>> for > >>>>>>> JOINs based metrics especially from those who give a try to > >>>>> non-collocated > >>>>>>> joins in production and want to measure them somehow. This is why, > >>>>>>> personally, I prefer to see the metrics below in the top priority > >> list > >>>>> as > >>>>>>> well: > >>>>>>> > >>>>>>> if a query was executed in the collocated or non-collocated mode. > >>>> Three > >>>>>>> results are valid: collocated, non-collocated, simple query (no > >>>> joins). > >>>>>>> non-collocated query: size of the data exchanged between the nodes > to > >>>>>>> complete a specific join. If there are multiple joins in the query > we > >>>>> need > >>>>>>> to provide this metric for every of them. > >>>>>>> non-collocated and collocated query: a part of the time spent > joining > >>>>> the > >>>>>>> data. If there are multiple joins in the query we need to provide > >> this > >>>>>>> metric for every of them. > >>>>>>> > >>>>>>> As for “unicast” and “broadcast”, agree, let’s ignore it for now. > >>>>>>> > >>>>>>> In any case, can we include timing information (map phase, reduce > >>>> phase, > >>>>>>> join phase) into an execution plan produced by H2? Are there any > >>>>>>> implementation hooks? > >>>>>>> > >>>>>>> — > >>>>>>> Denis > >>>>>>> > >>>>>>> > >>>>>>>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan < > >>>> dsetrak...@apache.org> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> I think some of the metrics specified by Denis also make sense, > so I > >>>>>>> would > >>>>>>>> add them as well. See below... > >>>>>>>> > >>>>>>>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov < > >>>> voze...@gridgain.com > >>>>>>> <mailto:voze...@gridgain.com>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Denis, > >>>>>>>>> > >>>>>>>>> Query execution is complex process involving different stages > which > >>>>> are > >>>>>>> not > >>>>>>>>> very easy to match with each other. Especially provided that any > >>>> node > >>>>>>> can > >>>>>>>>> leave topology at any time. Another problem is that engine > evolves > >>>> and > >>>>>>>>> metrics like "did a query do broadcast or unicast" may easily > >> become > >>>>>>>>> useless at some point, because for example there will be neither > >>>>>>> unicast, > >>>>>>>>> nor broadast, but something different. On the other hand I > >>>> completely > >>>>>>> agree > >>>>>>>>> that performance monitoring is essential part of any mature DBMS. > >>>>>>>>> > >>>>>>>>> I would start with metrics which are both very basic and easy to > >>>>>>> implement > >>>>>>>>> at the same time. For example we can add fingerprint (hash) to > >> every > >>>>>>> query > >>>>>>>>> which will be used to join "map" and "reduce" parts with each > other > >>>>> and > >>>>>>> add > >>>>>>>>> the following basic metrics: > >>>>>>>>> 1) Execution count for particular query > >>>>>>>>> 2) Number of map nodes - min, max, avg > >>>>>>>>> > >>>>>>>> > >>>>>>>> (1) and (2) makes sense > >>>>>>>> > >>>>>>>> > >>>>>>>>> 3) Map step duration (if applicable) - min, max, > >>>>>>>> > >>>>>>>> 4) Reduce step duration (if applicable) - min, max, avg > >>>>>>>>> > >>>>>>>> > >>>>>>>> Not sure if (3) and (4) are needed. I would only add them if they > >> are > >>>>>>> easy > >>>>>>>> to implement. > >>>>>>>> > >>>>>>>> I would also add these: > >>>>>>>> > >>>>>>>> 5) Collocated: yes/no > >>>>>>>> 6) last execution time > >>>>>>>> 7) min/max/average execution duration > >>>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Once done users will be able to get statistics for particular > >>>> queries. > >>>>>>>>> > >>>>>>>>> Vladimir. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <dma...@apache.org> > >>>>> wrote: > >>>>>>>>> > >>>>>>>>>> BTW, > >>>>>>>>>> > >>>>>>>>>> What if we expose per-query metrics below as a part of EXPLAIN > >>>>> ANALYZE? > >>>>>>>>>> Sergi, is this feasible? > >>>>>>>>>> > >>>>>>>>>> — > >>>>>>>>>> Denis > >>>>>>>>>> > >>>>>>>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <dma...@apache.org> > >>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Igniters, > >>>>>>>>>>> > >>>>>>>>>>> Let’s shed more light on SQL query execution internals > >>>> introducing a > >>>>>>>>> set > >>>>>>>>>> of useful metrics (https://issues.apache.org/ > >>>> jira/browse/IGNITE-4757 > >>>>> ). > >>>>>>>>>>> > >>>>>>>>>>> Per-query metrics. Total history size is defined by > >>>>>>>>> *CacheConfiguration. > >>>>>>>>>> getQueryDetailMetricsSize*: > >>>>>>>>>>> * if a query was executed in the collocated or non-collocated > >>>> mode. > >>>>>>>>>> Three results are valid: collocated, non-collocated, simple > query > >>>> (no > >>>>>>>>>> joins). > >>>>>>>>>>> * non-collocated query: size of the data exchanged between the > >>>> nodes > >>>>>>> to > >>>>>>>>>> complete a join. > >>>>>>>>>>> * non-collocated query: did a query do broadcast or unicast to > >> get > >>>>>>> data > >>>>>>>>>> needed to complete a join. > >>>>>>>>>>> * non-collocated and collocated query: a part of the time spent > >>>>>>> joining > >>>>>>>>>> the data. > >>>>>>>>>>> > >>>>>>>>>>> CacheMetrics: > >>>>>>>>>>> * an average number of executed SQL queries (collocated, > >>>>>>>>> non-collocated, > >>>>>>>>>> simple query (no joins)). > >>>>>>>>>>> > >>>>>>>>>>> Please don’t hesitate do share suggest another metrics or > improve > >>>>>>>>>> proposed ones. > >>>>>>>>>>> > >>>>>>>>>>> — > >>>>>>>>>>> Denis > >>>>>>> > >>>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Sergey Kozlov > >>>> GridGain Systems > >>>> www.gridgain.com > >>>> > >> > >> > >