Re: [HACKERS] auto_explain : log queries with wrong estimation

2017-08-28 Thread Adrien Nayrat
On 08/24/2017 03:08 PM, Maksim Milyutin wrote:
[...]
> 
> AFAICS you want to introduce two additional per-node variables:
>  - auto_explain_log_estimate_ratio that denotes minimum ratio (>= 1) between
> real value and planned one. I would add 'min' prefix before 'ratio'.
>  - auto_explain_log_estimate_min_rows - minimum absolute difference between
> those two values. IMHO this name is somewhat poor, the suffix 'min_diff_rows'
> looks better.
> If real expressions (ratio and diff) exceed these threshold values both, you 
> log
> this situation. I'm right?

Yes, you're totaly right! I wonder if "ratio" is fine or if I should use 
"factor"?

[...]
> 
> Instrumentation is initialized only with analyze (log_analyze is true)[1]

Good, I didn't notice instrumentation can be enabled in auto_explain's hook. I
added these lines and it works :

if (auto_explain_log_estimate_ratio || auto_explain_log_estimate_min_rows)
{
 queryDesc->instrument_options |= INSTRUMENT_ROWS;
}

But I need to undestand how instrumentation works.

Thanks for your answer. I will continue my work, actually my patch is not
functionnal.


-- 
Adrien NAYRAT




signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] auto_explain : log queries with wrong estimation

2017-08-24 Thread Maksim Milyutin

On 24.08.2017 14:56, Adrien Nayrat wrote:


Hi hackers,


Hi,

I try to made a patch to auto_explain in order to log queries with 
wrong estimation.


I compare planned row id : queryDesc->planstate->plan->plan_rows

Vs ntuples : queryDesc->planstate->instrument->ntuples;


AFAICS you want to introduce two additional per-node variables:
 - auto_explain_log_estimate_ratio that denotes minimum ratio (>= 1) 
between real value and planned one. I would add 'min' prefix before 
'ratio'.
 - auto_explain_log_estimate_min_rows - minimum absolute difference 
between those two values. IMHO this name is somewhat poor, the suffix 
'min_diff_rows' looks better.
If real expressions (ratio and diff) exceed these threshold values both, 
you log this situation. I'm right?


If I understand, instrumentation is used only with explain. So my 
patch works

only with explain (and segfault without).


Instrumentation is initialized only with analyze (log_analyze is true)[1]


Is there a simple way to get ntuples?


It's interesting question. In one's time I didn't find any way to get 
the amount of tuples emitted from a node.



1. contrib/auto_explain/auto_explain.c:221

--
Regards,
Maksim Milyutin



[HACKERS] auto_explain : log queries with wrong estimation

2017-08-24 Thread Adrien Nayrat
Hi hackers,


I try to made a patch to auto_explain in order to log queries with wrong 
estimation.

I compare planned row id : queryDesc->planstate->plan->plan_rows

Vs ntuples : queryDesc->planstate->instrument->ntuples;

If I understand, instrumentation is used only with explain. So my patch works
only with explain (and segfault without).


Is there a simple way to get ntuples?

Attached a naive patch.

Thanks :)

-- 
Adrien NAYRAT

http://dalibo.com - http://dalibo.org
diff --git a/contrib/auto_explain/auto_explain.c 
b/contrib/auto_explain/auto_explain.c
index edcb91542a..165fe8e4ae 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -30,6 +30,8 @@ static bool auto_explain_log_timing = true;
 static int auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static bool auto_explain_log_nested_statements = false;
 static double auto_explain_sample_rate = 1;
+static int auto_explain_log_estimate_ratio = -1;
+static int auto_explain_log_estimate_min_rows = -1;
 
 static const struct config_enum_entry format_options[] = {
{"text", EXPLAIN_FORMAT_TEXT, false},
@@ -52,7 +54,9 @@ static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
 static bool current_query_sampled = true;
 
 #define auto_explain_enabled() \
-   (auto_explain_log_min_duration >= 0 && \
+   (auto_explain_log_min_duration >= 0 || \
+auto_explain_log_estimate_ratio >= 0 || \
+auto_explain_log_estimate_min_rows >= 0 && \
 (nesting_level == 0 || auto_explain_log_nested_statements))
 
 void   _PG_init(void);
@@ -176,6 +180,31 @@ _PG_init(void)
 NULL,
 NULL);
 
+   DefineCustomIntVariable("auto_explain.estimate_ratio",
+"Planned / returned 
row ratio.",
+NULL,
+
_explain_log_estimate_ratio,
+-1,
+-1, INT_MAX / 1000,
+PGC_SUSET,
+0,
+NULL,
+NULL,
+NULL);
+
+   DefineCustomIntVariable("auto_explain.estimate_min_rows",
+"Planned / returned 
min rows.",
+NULL,
+
_explain_log_estimate_min_rows,
+-1,
+-1, INT_MAX / 1000,
+PGC_SUSET,
+0,
+NULL,
+NULL,
+NULL);
+
+
EmitWarningsOnPlaceholders("auto_explain");
 
/* Install hooks. */
@@ -309,6 +338,30 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
if (queryDesc->totaltime && auto_explain_enabled() && 
current_query_sampled)
{
double  msec;
+   int estimate_ratio = -1;
+   int estimate_rows = -1;
+   int plan_rows = 
queryDesc->planstate->plan->plan_rows;
+   int ntuples = 
queryDesc->planstate->instrument->ntuples;
+
+   if (plan_rows == 0 && ntuples == 0)
+   {
+   estimate_ratio = 1;
+   }
+   else if ( ntuples > plan_rows && plan_rows > 0)
+   {
+   estimate_ratio = ntuples / plan_rows;
+   }
+   else if ( ntuples > 0)
+   {
+   estimate_ratio = plan_rows / ntuples;
+   }
+   else
+   {
+   estimate_ratio = INT_MAX;
+   }
+
+   estimate_rows = abs(ntuples - plan_rows);
+
 
/*
 * Make sure stats accumulation is done.  (Note: it's okay if 
several
@@ -318,7 +371,10 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 
/* Log plan if duration is exceeded. */
msec = queryDesc->totaltime->total * 1000.0;
-   if (msec >= auto_explain_log_min_duration)
+   if ((msec >= auto_explain_log_min_duration &&
+ auto_explain_log_min_duration >= 0 ) ||
+   (estimate_ratio >=