Re: [HACKERS] [GENERAL] Idea for the statistics collector

2005-04-20 Thread Andreas Pflug
Bruce Momjian wrote:
Christopher Kings-Lynne wrote:
Good god - how old was that email? 2002???

Yep, and been in my mailbox since then, waiting for me to process it
into a TODO entry.
Exciting what one can find wiping the floor of the mailbox :-)
Regards,
Andreas
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [GENERAL] Idea for the statistics collector

2005-04-19 Thread Bruce Momjian

Added to TODO:

* Add tool to query pg_stat_* tables and report indexes that aren't 
needed
  or tables that might need indexes

---

Christopher Kings-Lynne wrote:
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:
 
 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.
 
 This would put us in the 'mysql makes my indices for me by magic' league -
 but would be far more powerful and flexible.  How to do multikey indices is
 beyond me tho.
 
 *sigh* I'm recovering from a septoplasty on my nose atm, so I might have
 some time to do some coding!
 
 Chris
 
 - Original Message -
 From: Bruce Momjian pgman@candle.pha.pa.us
 To: Martijn van Oosterhout kleptog@svana.org
 Cc: PostgreSQL-development pgsql-hackers@postgresql.org
 Sent: Friday, June 21, 2002 10:50 AM
 Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector
 
 
  Martijn van Oosterhout wrote:
   Since it's currently all for collecting statistics on tables, why can't
 it
   collect another type of statistic, like:
  
   - How often the estimator gets it wrong?
  
   At the end of an index scan, the executor could compare the number of
 rows
   returned against what was estimated, and if it falls outside a certain
   range, flag it.
  
   Also, the average ratio of rows coming out of a distinct node vs the
 number
   going in.
  
   For a join clause, the amount of correlation between two columns (hard).
  
   etc
  
   Ideally, the planner could then use this info to make better plans.
   Eventually, the whole system could become somewhat self-tuning.
  
   Does anyone see any problems with this?
 
  [ Discussion moved to hackers.]
 
  I have thought that some type of feedback from the executor back into
  the optimizer would be a good feature.  Not sure how to do it, but your
  idea makes sense.  It certainly could update the table statistics after
  a sequential scan.
 
  --
Bruce Momjian|  http://candle.pha.pa.us
pgman@candle.pha.pa.us   |  (610) 853-3000
+  If your life is a hard drive, |  830 Blythe Avenue
+  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
 
  ---(end of broadcast)---
  TIP 6: Have you searched our list archives?
 
  http://archives.postgresql.org
 
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] [GENERAL] Idea for the statistics collector

2005-04-19 Thread Christopher Kings-Lynne
Good god - how old was that email? 2002???
Chris
Bruce Momjian wrote:
Added to TODO:
* Add tool to query pg_stat_* tables and report indexes that aren't 
needed
  or tables that might need indexes
---
Christopher Kings-Lynne wrote:
I was thinking of writing a command line tool like 'pgtune' that looks at
the stats views and will generate SQL code for, or do automatically the
following:
* Dropping indices that are never used
* Creating appropriate indices to avoid large, expensive sequential scans.
This would put us in the 'mysql makes my indices for me by magic' league -
but would be far more powerful and flexible.  How to do multikey indices is
beyond me tho.
*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
some time to do some coding!
Chris
- Original Message -
From: Bruce Momjian pgman@candle.pha.pa.us
To: Martijn van Oosterhout kleptog@svana.org
Cc: PostgreSQL-development pgsql-hackers@postgresql.org
Sent: Friday, June 21, 2002 10:50 AM
Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector

Martijn van Oosterhout wrote:
Since it's currently all for collecting statistics on tables, why can't
it
collect another type of statistic, like:
- How often the estimator gets it wrong?
At the end of an index scan, the executor could compare the number of
rows
returned against what was estimated, and if it falls outside a certain
range, flag it.
Also, the average ratio of rows coming out of a distinct node vs the
number
going in.
For a join clause, the amount of correlation between two columns (hard).
etc
Ideally, the planner could then use this info to make better plans.
Eventually, the whole system could become somewhat self-tuning.
Does anyone see any problems with this?
[ Discussion moved to hackers.]
I have thought that some type of feedback from the executor back into
the optimizer would be a good feature.  Not sure how to do it, but your
idea makes sense.  It certainly could update the table statistics after
a sequential scan.
--
 Bruce Momjian|  http://candle.pha.pa.us
 pgman@candle.pha.pa.us   |  (610) 853-3000
 +  If your life is a hard drive, |  830 Blythe Avenue
 +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
---(end of broadcast)---
TIP 6: Have you searched our list archives?
http://archives.postgresql.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] [GENERAL] Idea for the statistics collector

2005-04-19 Thread Bruce Momjian
Christopher Kings-Lynne wrote:
 Good god - how old was that email? 2002???

Yep, and been in my mailbox since then, waiting for me to process it
into a TODO entry.

---


 
 Chris
 
 Bruce Momjian wrote:
  Added to TODO:
  
  * Add tool to query pg_stat_* tables and report indexes that aren't 
  needed
or tables that might need indexes
  
  ---
  
  Christopher Kings-Lynne wrote:
  
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:
 
 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.
 
 This would put us in the 'mysql makes my indices for me by magic' league -
 but would be far more powerful and flexible.  How to do multikey indices is
 beyond me tho.
 
 *sigh* I'm recovering from a septoplasty on my nose atm, so I might have
 some time to do some coding!
 
 Chris
 
 - Original Message -
 From: Bruce Momjian pgman@candle.pha.pa.us
 To: Martijn van Oosterhout kleptog@svana.org
 Cc: PostgreSQL-development pgsql-hackers@postgresql.org
 Sent: Friday, June 21, 2002 10:50 AM
 Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector
 
 
 
 Martijn van Oosterhout wrote:
 
 Since it's currently all for collecting statistics on tables, why can't
 
 it
 
 collect another type of statistic, like:
 
 - How often the estimator gets it wrong?
 
 At the end of an index scan, the executor could compare the number of
 
 rows
 
 returned against what was estimated, and if it falls outside a certain
 range, flag it.
 
 Also, the average ratio of rows coming out of a distinct node vs the
 
 number
 
 going in.
 
 For a join clause, the amount of correlation between two columns (hard).
 
 etc
 
 Ideally, the planner could then use this info to make better plans.
 Eventually, the whole system could become somewhat self-tuning.
 
 Does anyone see any problems with this?
 
 [ Discussion moved to hackers.]
 
 I have thought that some type of feedback from the executor back into
 the optimizer would be a good feature.  Not sure how to do it, but your
 idea makes sense.  It certainly could update the table statistics after
 a sequential scan.
 
 --
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
 
 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?
 
 http://archives.postgresql.org
 
 
 
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 
 
 
  
  
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Christopher Kings-Lynne

I was thinking of writing a command line tool like 'pgtune' that looks at
the stats views and will generate SQL code for, or do automatically the
following:

* Dropping indices that are never used
* Creating appropriate indices to avoid large, expensive sequential scans.

This would put us in the 'mysql makes my indices for me by magic' league -
but would be far more powerful and flexible.  How to do multikey indices is
beyond me tho.

*sigh* I'm recovering from a septoplasty on my nose atm, so I might have
some time to do some coding!

Chris

- Original Message -
From: Bruce Momjian [EMAIL PROTECTED]
To: Martijn van Oosterhout [EMAIL PROTECTED]
Cc: PostgreSQL-development [EMAIL PROTECTED]
Sent: Friday, June 21, 2002 10:50 AM
Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector


 Martijn van Oosterhout wrote:
  Since it's currently all for collecting statistics on tables, why can't
it
  collect another type of statistic, like:
 
  - How often the estimator gets it wrong?
 
  At the end of an index scan, the executor could compare the number of
rows
  returned against what was estimated, and if it falls outside a certain
  range, flag it.
 
  Also, the average ratio of rows coming out of a distinct node vs the
number
  going in.
 
  For a join clause, the amount of correlation between two columns (hard).
 
  etc
 
  Ideally, the planner could then use this info to make better plans.
  Eventually, the whole system could become somewhat self-tuning.
 
  Does anyone see any problems with this?

 [ Discussion moved to hackers.]

 I have thought that some type of feedback from the executor back into
 the optimizer would be a good feature.  Not sure how to do it, but your
 idea makes sense.  It certainly could update the table statistics after
 a sequential scan.

 --
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

 ---(end of broadcast)---
 TIP 6: Have you searched our list archives?

 http://archives.postgresql.org





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:

 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.

Dropping unused indices sounds good --- but beware of dropping unique
indexes; they may be there to enforce a constraint, and not because of
any desire to use them in queries.

I'm not sure how you're going to automatically intuit appropriate
indexes to add, though.  You'd need to look at a suitable workload
(ie, a representative set of queries) which is not data that's readily
available from the stats views.  Perhaps we could expect the DBA to
provide a segment of log output that includes debug_print_query
and show_query_stats results.

regards, tom lane



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-24 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
 I was thinking of writing a command line tool like 'pgtune' that looks at
 the stats views and will generate SQL code for, or do automatically the
 following:
 
 * Dropping indices that are never used
 * Creating appropriate indices to avoid large, expensive sequential scans.
 
 This would put us in the 'mysql makes my indices for me by magic' league -
 but would be far more powerful and flexible.  How to do multikey indices is
 beyond me tho.

This is a great idea.  I have been wanting to do something like this
myself but probably won't get the time.

Does MySQL really make indexes by magic?

Also, I had to look up the contraction for will not because I always
get that confused (won't).  I just found a web page on it:

http://www.straightdope.com/mailbag/mwont.html

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html





Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-20 Thread Bruce Momjian

Martijn van Oosterhout wrote:
 Since it's currently all for collecting statistics on tables, why can't it
 collect another type of statistic, like:
 
 - How often the estimator gets it wrong?
 
 At the end of an index scan, the executor could compare the number of rows
 returned against what was estimated, and if it falls outside a certain
 range, flag it.
 
 Also, the average ratio of rows coming out of a distinct node vs the number
 going in.
 
 For a join clause, the amount of correlation between two columns (hard).
 
 etc
 
 Ideally, the planner could then use this info to make better plans.
 Eventually, the whole system could become somewhat self-tuning.
 
 Does anyone see any problems with this?

[ Discussion moved to hackers.]

I have thought that some type of feedback from the executor back into
the optimizer would be a good feature.  Not sure how to do it, but your
idea makes sense.  It certainly could update the table statistics after
a sequential scan.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-20 Thread Neil Conway

On Thu, 20 Jun 2002 22:50:04 -0400 (EDT)
Bruce Momjian [EMAIL PROTECTED] wrote:
 I have thought that some type of feedback from the executor back into
 the optimizer would be a good feature.  Not sure how to do it, but your
 idea makes sense.  It certainly could update the table statistics after
 a sequential scan.

Search the archives for a thread I started on -hackers called self-tuning
histograms, which talks about a pretty similar idea. The technique there
applies only to histograms, and builds the histogram based *only* upon
the data provided by the executor.

Tom commented that it's probably a better idea to concentrate on more
elementary techniques, like multi-dimensional histograms, before starting
on ST histograms. I agree, and plan to look at multi-dimensional histograms
when I get some spare time.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED]
PGP Key ID: DB3C29FC

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] [GENERAL] Idea for the statistics collector

2002-06-20 Thread Dann Corbit

Here are some class notes that contain some very good ideas with
terrific explanations:
http://www.cs.duke.edu/education/courses/fall01/cps216/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly