Re: [HACKERS] choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Tom Lane wrote: > One idea I thought about was to sort by index scan cost, using > selectivity only as a tiebreaker for cost, rather than the other way > around as is currently done. This seems fairly plausible because > indexscans that are cheaper than other indexscans likely return fewer > rows too, and so selectivity is already accounted for to some extent --- > at least you can't have an enormously worse selectivity at lower cost, > whereas Steve's example proves it doesn't work the other way. But I'm > worried about breaking the reasoning about redundant indexes that's > mentioned in the comments. > > Another alternative that would respond to the immediate problem is to > maintain the current sort order, but as we come to each index, consider > using that one alone, and throw away whatever AND we might have built up > if that one alone beats the AND-so-far. This seems more conservative, > as it's unlikely to break any cases that work well now, but on the other > hand it feels like plastering another wart atop a structure that's > already rather rickety. > > Has anyone got any thoughts about the best way to do this? How about doing both: sort the index by index scan cost; then pick the first index on the list and start adding indexes when they lower the cost. When adding each index, consider it by itself against the already stacked indexes. If the cost is lower, put this index at the top of the list, and restart the algorithm (after the sorting step of course). I think the concern about condition redundancy should be attacked separately. How about just comparing whether they have common prefixes of conditions? I admit I don't understand what would happen with indexes defined like (lower(A), B, C) versus (A, B) for example. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
choose_bitmap_and again (was Re: [PERFORM] Strangely Variable Query Performance)
Steve <[EMAIL PROTECTED]> writes: > [ strange planner misbehavior in 8.2.3 ] After some off-list investigation (thanks, Steve, for letting me poke at your machine), the short answer is that the heuristics used by choose_bitmap_and() suck. The problem query is like select ... from ds where ds.receipt >= '1998-12-30 0:0:0' and ds.encounter_id in ( ... 100 distinct values ... ); and the table has a truly remarkable variety of indexes on encounter_id, receipt, and combinations of them with other columns. The receipt condition is actually in effect a no-op, because all receipt dates are later than that, but because ineq_histogram_selectivity doesn't trust histogram data unreservedly we compute a selectivity of about 0.7 for it. That means that the indexes that cover both receipt and encounter_id are given a selectivity score just fractionally better than those involving encounter_id alone, and therefore they sort first in choose_bitmap_and's sort step, and the way that that routine is coded, only combinations of the very first index with other ones will be considered for a bitmap heap scan. So the possibility of using just the index on encounter_id alone is never considered, even though that alternative is vastly cheaper than the alternatives that are considered. (It happens that encounter_id is a low-order column in all the indexes that include receipt, and so these scans end up covering the whole index ... multiple times even. The cost estimation is fine --- the thing knows these are expensive --- what's falling down is the heuristic for which combinations of indexes to consider using in a bitmap scan.) The original coding of choose_bitmap_and involved a "fuzzy" comparison of selectivities, which would have avoided this problem, but we got rid of that later because it had its own problems. In fact, choose_bitmap_and has caused us enough problems that I'm thinking we need a fundamental rethink of how it works, rather than just marginal tweaks. If you haven't looked at this code before, the comments explain the idea well enough: /* * choose_bitmap_and *Given a nonempty list of bitmap paths, AND them into one path. * * This is a nontrivial decision since we can legally use any subset of the * given path set. We want to choose a good tradeoff between selectivity * and cost of computing the bitmap. * * The result is either a single one of the inputs, or a BitmapAndPath * combining multiple inputs. */ ... /* * In theory we should consider every nonempty subset of the given paths. * In practice that seems like overkill, given the crude nature of the * estimates, not to mention the possible effects of higher-level AND and * OR clauses. As a compromise, we sort the paths by selectivity. We * always take the first, and sequentially add on paths that result in a * lower estimated cost. * * We also make some effort to detect directly redundant input paths, as * can happen if there are multiple possibly usable indexes. (Another way * it can happen is that best_inner_indexscan will find the same OR join * clauses that create_or_index_quals has pulled OR restriction clauses * out of, and then both versions show up as duplicate paths.) We * consider an index redundant if any of its index conditions were already * used by earlier indexes. (We could use predicate_implied_by to have a * more intelligent, but much more expensive, check --- but in most cases * simple pointer equality should suffice, since after all the index * conditions are all coming from the same RestrictInfo lists.) * * You might think the condition for redundancy should be "all index * conditions already used", not "any", but this turns out to be wrong. * For example, if we use an index on A, and then come to an index with * conditions on A and B, the only way that the second index can be later * in the selectivity-order sort is if the condition on B is completely * non-selective. In any case, we'd surely be drastically misestimating * the selectivity if we count the same condition twice. * * We include index predicate conditions in the redundancy test. Because * the test is just for pointer equality and not equal(), the effect is * that use of the same partial index in two different AND elements is * considered redundant. (XXX is this too strong?) * * Note: outputting the selected sub-paths in selectivity order is a good * thing even if we weren't using that as part of the selection method, * because it makes the short-circuit case in MultiExecBitmapAnd() more * likely to apply. */ One idea I thought about was to sort by index scan cost, using selectivity only as a tiebreaker for cost, rather than the other way around as is currently done. This seems fairly plausible because indexscans that are cheaper than other indexscans likely
Re: [PERFORM] Finding bloated indexes?
On Apr 13, 2007, at 4:01 PM, Dan Harris wrote: Is there a pg_stat_* table or the like that will show how bloated an index is? I am trying to squeeze some disk space and want to track down where the worst offenders are before performing a global REINDEX on all tables, as the database is rougly 400GB on disk and this takes a very long time to run. I find this as a helpful guide: select relname,relkind,relpages from pg_class where relname like 'user %'; for example (obviously change the LIKE clause to something useful to you). then with your knowledge of how big your rows are and how many relpages the table itself takes, you can see if your index is too big. It helps to watch these numbers over time. Also, running "analyze verbose" on the table gives you a hint at how sparse the pages are, which might imply something for table bloat. I'm not sure. More expensive is "vacuum verbose" which gives lots of info on how many "unused pointers" there are in your indexes. This may be of use. If this is a high number compared to the number of row versions, then you probably have bloat there. smime.p7s Description: S/MIME cryptographic signature
Re: [PERFORM] Please humor me ...
On Mon, 2007-04-09 at 16:05 -0400, Carlos Moreno wrote: > 2) What would be the real implications of doing that? Many people ask, hence why a whole chapter of the manual is devoted to this important topic. http://developer.postgresql.org/pgdocs/postgres/wal.html -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question about memory allocations
On Friday 13 April 2007 14:53:53 Carlos Moreno wrote: > How does PG take advantage of the available memory? I mean, if I have a > machine with, say, 4 or 8GB of memory, how will those GBs would end > up being used? They just do?? (I mean, I would find that a vaild > answer; On linux the filesystem cache will gobble them up, which means indirectly pgsql profits as well (assuming no other apps poison the fs cache). jan -- -- Jan de Visser [EMAIL PROTECTED] Baruk Khazad! Khazad ai-menu! -- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Finding bloated indexes?
Is there a pg_stat_* table or the like that will show how bloated an index is? I am trying to squeeze some disk space and want to track down where the worst offenders are before performing a global REINDEX on all tables, as the database is rougly 400GB on disk and this takes a very long time to run. I have been able to do this with tables, using a helpful view posted to this list a few months back, but I'm not sure if I can get the same results on indexes. Thanks -Dan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Question about memory allocations
Steve wrote: Common wisdom in the past has been that values above a couple of hundred MB will degrade performance. The annotated config file talks about setting shared_buffers to a third of the available memory --- well, it says "it should be no more than 1/3 of the total amount of memory" (quoting off the top of my head). Don't recall seeing any warning about not exceeding a few hundred megabytes. My eternal curiosity when it comes to this memory and shared_buffers thing: How does PG take advantage of the available memory? I mean, if I have a machine with, say, 4 or 8GB of memory, how will those GBs would end up being used? They just do?? (I mean, I would find that a vaild answer; but I ask, because this configuration parameters stuff makes me think that perhaps PG does not simply use whatever memory is in there, but it has to go through the parameters in the config file to allocate whatever it has to use). So, is it just like that? We put more memory and PG will automatically make use of it? Carlos -- ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Question about memory allocations
At 12:38 PM 4/13/2007, Steve wrote: Really? Wow! Common wisdom in the past has been that values above a couple of hundred MB will degrade performance. Have you done any benchmarks on 8.2.x that show that you get an improvement from this, or did you just take the "too much of a good thing is wonderful" approach? Not to be rude, but there's more common wisdom on this particular subject than anything else in postgres I'd say ;) I think I recently read someone else on this list who's laundry-listed the recommended memory values that are out there these days and pretty much it ranges from what you've just said to "half of system memory". I've tried many memory layouts, and in my own experience with this huge DB, more -does- appear to be better but marginally so; more memory alone won't fix a speed problem. It may be a function of how much reading/writing is done to the DB and if fsync is used or not if that makes any sense :) Seems there's no "silver bullet" to the shared_memory question. Or if there is, nobody can agree on it ;) One of the reasons for the wide variance in suggested values for pg memory use is that pg 7.x and pg 8.x are =very= different beasts. If you break the advice into pg 7.x and pg 8.x categories, you find that there is far less variation in the suggestions. Bottom line: pg 7.x could not take advantage of larger sums of memory anywhere near as well as pg 8.x can. Cheers, Ron ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Question about memory allocations
Really? Wow! Common wisdom in the past has been that values above a couple of hundred MB will degrade performance. Have you done any benchmarks on 8.2.x that show that you get an improvement from this, or did you just take the "too much of a good thing is wonderful" approach? Not to be rude, but there's more common wisdom on this particular subject than anything else in postgres I'd say ;) I think I recently read someone else on this list who's laundry-listed the recommended memory values that are out there these days and pretty much it ranges from what you've just said to "half of system memory". I've tried many memory layouts, and in my own experience with this huge DB, more -does- appear to be better but marginally so; more memory alone won't fix a speed problem. It may be a function of how much reading/writing is done to the DB and if fsync is used or not if that makes any sense :) Seems there's no "silver bullet" to the shared_memory question. Or if there is, nobody can agree on it ;) Anyway, talk to you later! Steve ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] local selectivity estimation - computing frequency of predicates
"Avdhoot Kishore Saple" <[EMAIL PROTECTED]> writes: > How to compute the frequency of predicate (e.g. Salary > $7) in an > SQL query from a DB's pre-defined indexes?". I'm specifically looking at > how to retrieve information about indices (like number of pages at each > level of index, range of attribute values etc.) I don't think what you're looking for is exposed anywhere. Postgres doesn't rely on indexes for statistical information anyway; the pg_statistic system catalog (see also pg_stats view) is used for that. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] local selectivity estimation - computing frequency of predicates
Dear All. How to compute the frequency of predicate (e.g. Salary > $7) in an SQL query from a DB's pre-defined indexes?". I'm specifically looking at how to retrieve information about indices (like number of pages at each level of index, range of attribute values etc.) Any suggestions regarding the same would be great Thanks, Avdhoot K. Saple Junior Research Associate High Performance & Grid Computing Infosys Technologies Ltd., Bangalore CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: [PERFORM] Question about memory allocations
On Tue, 2007-04-10 at 15:28 -0400, Steve wrote: > > I'm trying to tune the memory usage of a new machine that has a -lot- of > memory in it (32 gigs). ... > > shared_buffers = 16GB Really? Wow! Common wisdom in the past has been that values above a couple of hundred MB will degrade performance. Have you done any benchmarks on 8.2.x that show that you get an improvement from this, or did you just take the "too much of a good thing is wonderful" approach? Cheers, Andrew. - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267 You have an unusual equipment for success. Be sure to use it properly. - signature.asc Description: This is a digitally signed message part