Re: [HACKERS] [GENERAL] Idea for the statistics collector
Bruce Momjian wrote: Christopher Kings-Lynne wrote: Good god - how old was that email? 2002??? Yep, and been in my mailbox since then, waiting for me to process it into a TODO entry. Exciting what one can find wiping the floor of the mailbox :-) Regards, Andreas ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Added to TODO: * Add tool to query pg_stat_* tables and report indexes that aren't needed or tables that might need indexes --- Christopher Kings-Lynne wrote: I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. *sigh* I'm recovering from a septoplasty on my nose atm, so I might have some time to do some coding! Chris - Original Message - From: Bruce Momjian pgman@candle.pha.pa.us To: Martijn van Oosterhout kleptog@svana.org Cc: PostgreSQL-development pgsql-hackers@postgresql.org Sent: Friday, June 21, 2002 10:50 AM Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector Martijn van Oosterhout wrote: Since it's currently all for collecting statistics on tables, why can't it collect another type of statistic, like: - How often the estimator gets it wrong? At the end of an index scan, the executor could compare the number of rows returned against what was estimated, and if it falls outside a certain range, flag it. Also, the average ratio of rows coming out of a distinct node vs the number going in. For a join clause, the amount of correlation between two columns (hard). etc Ideally, the planner could then use this info to make better plans. Eventually, the whole system could become somewhat self-tuning. Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Good god - how old was that email? 2002??? Chris Bruce Momjian wrote: Added to TODO: * Add tool to query pg_stat_* tables and report indexes that aren't needed or tables that might need indexes --- Christopher Kings-Lynne wrote: I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. *sigh* I'm recovering from a septoplasty on my nose atm, so I might have some time to do some coding! Chris - Original Message - From: Bruce Momjian pgman@candle.pha.pa.us To: Martijn van Oosterhout kleptog@svana.org Cc: PostgreSQL-development pgsql-hackers@postgresql.org Sent: Friday, June 21, 2002 10:50 AM Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector Martijn van Oosterhout wrote: Since it's currently all for collecting statistics on tables, why can't it collect another type of statistic, like: - How often the estimator gets it wrong? At the end of an index scan, the executor could compare the number of rows returned against what was estimated, and if it falls outside a certain range, flag it. Also, the average ratio of rows coming out of a distinct node vs the number going in. For a join clause, the amount of correlation between two columns (hard). etc Ideally, the planner could then use this info to make better plans. Eventually, the whole system could become somewhat self-tuning. Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Christopher Kings-Lynne wrote: Good god - how old was that email? 2002??? Yep, and been in my mailbox since then, waiting for me to process it into a TODO entry. --- Chris Bruce Momjian wrote: Added to TODO: * Add tool to query pg_stat_* tables and report indexes that aren't needed or tables that might need indexes --- Christopher Kings-Lynne wrote: I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. *sigh* I'm recovering from a septoplasty on my nose atm, so I might have some time to do some coding! Chris - Original Message - From: Bruce Momjian pgman@candle.pha.pa.us To: Martijn van Oosterhout kleptog@svana.org Cc: PostgreSQL-development pgsql-hackers@postgresql.org Sent: Friday, June 21, 2002 10:50 AM Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector Martijn van Oosterhout wrote: Since it's currently all for collecting statistics on tables, why can't it collect another type of statistic, like: - How often the estimator gets it wrong? At the end of an index scan, the executor could compare the number of rows returned against what was estimated, and if it falls outside a certain range, flag it. Also, the average ratio of rows coming out of a distinct node vs the number going in. For a join clause, the amount of correlation between two columns (hard). etc Ideally, the planner could then use this info to make better plans. Eventually, the whole system could become somewhat self-tuning. Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [GENERAL] Idea for the statistics collector
I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. *sigh* I'm recovering from a septoplasty on my nose atm, so I might have some time to do some coding! Chris - Original Message - From: Bruce Momjian [EMAIL PROTECTED] To: Martijn van Oosterhout [EMAIL PROTECTED] Cc: PostgreSQL-development [EMAIL PROTECTED] Sent: Friday, June 21, 2002 10:50 AM Subject: Re: [HACKERS] [GENERAL] Idea for the statistics collector Martijn van Oosterhout wrote: Since it's currently all for collecting statistics on tables, why can't it collect another type of statistic, like: - How often the estimator gets it wrong? At the end of an index scan, the executor could compare the number of rows returned against what was estimated, and if it falls outside a certain range, flag it. Also, the average ratio of rows coming out of a distinct node vs the number going in. For a join clause, the amount of correlation between two columns (hard). etc Ideally, the planner could then use this info to make better plans. Eventually, the whole system could become somewhat self-tuning. Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Christopher Kings-Lynne [EMAIL PROTECTED] writes: I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. Dropping unused indices sounds good --- but beware of dropping unique indexes; they may be there to enforce a constraint, and not because of any desire to use them in queries. I'm not sure how you're going to automatically intuit appropriate indexes to add, though. You'd need to look at a suitable workload (ie, a representative set of queries) which is not data that's readily available from the stats views. Perhaps we could expect the DBA to provide a segment of log output that includes debug_print_query and show_query_stats results. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Christopher Kings-Lynne wrote: I was thinking of writing a command line tool like 'pgtune' that looks at the stats views and will generate SQL code for, or do automatically the following: * Dropping indices that are never used * Creating appropriate indices to avoid large, expensive sequential scans. This would put us in the 'mysql makes my indices for me by magic' league - but would be far more powerful and flexible. How to do multikey indices is beyond me tho. This is a great idea. I have been wanting to do something like this myself but probably won't get the time. Does MySQL really make indexes by magic? Also, I had to look up the contraction for will not because I always get that confused (won't). I just found a web page on it: http://www.straightdope.com/mailbag/mwont.html -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Martijn van Oosterhout wrote: Since it's currently all for collecting statistics on tables, why can't it collect another type of statistic, like: - How often the estimator gets it wrong? At the end of an index scan, the executor could compare the number of rows returned against what was estimated, and if it falls outside a certain range, flag it. Also, the average ratio of rows coming out of a distinct node vs the number going in. For a join clause, the amount of correlation between two columns (hard). etc Ideally, the planner could then use this info to make better plans. Eventually, the whole system could become somewhat self-tuning. Does anyone see any problems with this? [ Discussion moved to hackers.] I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [GENERAL] Idea for the statistics collector
On Thu, 20 Jun 2002 22:50:04 -0400 (EDT) Bruce Momjian [EMAIL PROTECTED] wrote: I have thought that some type of feedback from the executor back into the optimizer would be a good feature. Not sure how to do it, but your idea makes sense. It certainly could update the table statistics after a sequential scan. Search the archives for a thread I started on -hackers called self-tuning histograms, which talks about a pretty similar idea. The technique there applies only to histograms, and builds the histogram based *only* upon the data provided by the executor. Tom commented that it's probably a better idea to concentrate on more elementary techniques, like multi-dimensional histograms, before starting on ST histograms. I agree, and plan to look at multi-dimensional histograms when I get some spare time. Cheers, Neil -- Neil Conway [EMAIL PROTECTED] PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [GENERAL] Idea for the statistics collector
Here are some class notes that contain some very good ideas with terrific explanations: http://www.cs.duke.edu/education/courses/fall01/cps216/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly