Re: [GENERAL] Unnecessary scan on a partial index slows down query dramatically
Thanks. I'm not really in a position to upgrade at the moment. I guess in the short-term I'll tweak the query to work around this (e.g. removing "status = 3" or adding more refining conditions both seem to work). Please let me know if there are configuration settings I should try. Thanks, Jimmy On Fri, Apr 25, 2008 at 12:13 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > "Jimmy Choi" <[EMAIL PROTECTED]> writes: > > A simple query is executing much slower than expected. When looking at > > the query plan, I see a bitmap index scan on a partial index that does > > not have any associated index condition. How could that happen? > > Easily --- it thinks that the partial index predicate is useful in > itself. > > > > "vacuum analyze" doesn't solve the problem. I'm running 8.1.5. > > choose_bitmap_and was rewritten (again) in 8.1.9. Try a newer > version. > > http://archives.postgresql.org/pgsql-committers/2007-04/msg00233.php > > regards, tom lane > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Unnecessary scan on a partial index slows down query dramatically
Hello, A simple query is executing much slower than expected. When looking at the query plan, I see a bitmap index scan on a partial index that does not have any associated index condition. How could that happen? The query is: select id from test_run_results where test_run_id = 12902 and status = 3 The query plan is: "Bitmap Heap Scan on test_run_results (cost=3240.97..3963.72 rows=2556 width=250)" " Recheck Cond: ((test_run_id = 12902) AND (status = 3))" " -> BitmapAnd (cost=3240.97..3240.97 rows=243 width=0)" "-> Bitmap Index Scan on trr_same_status_in_run_index (cost=0.00..24.33 rows=2556 width=0)" " Index Cond: ((test_run_id = 12902) AND (status = 3))" "-> Bitmap Index Scan on trr_same_failure_reason_for_owner_index (cost=0.00..3216.39 rows=884694 width=0)" My question refers to the last bitmap index scan which does not have an associated index cond line. trr_same_status_in_run_index is defined on (test_run_id, status) and trr_same_failure_reason_for_owner_index is a partial index defined on (owner_id, failure_reason) where status = 3. "vacuum analyze" doesn't solve the problem. I'm running 8.1.5. As you can see, there really isn't much reason to use the partial index at all. Remove "status = 3" from the query gets rid of the useless index scan and makes the query much faster. Thank, Jimmy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] "vacuum" and "cluster"
Presumably, even if CLUSTER does reindexing internally, it only does that for the index used for clustering. Since REINDEX includes all indices, CLUSTER cannot truly replace REINDEX. Correct? Jimmy On Wed, Apr 16, 2008 at 12:06 PM, Alvaro Herrera <[EMAIL PROTECTED]> wrote: > Craig Ringer escribió: > > > > It's not stated explicitly, but I'm pretty sure discussion here has > > mentioned that too. Given that, VACUUM FULL on a just-CLUSTERed table > > should be redundant. > > It is, and a REINDEX is redundant too because CLUSTER does it > internally. > > -- > Alvaro Herrerahttp://www.CommandPrompt.com/ > PostgreSQL Replication, Consulting, Custom Development, 24x7 support > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] "vacuum" and "cluster"
Hello, Does running "cluster" remove the need to run "vacuum"? I get a feeling that since cluster is already physically reordering the rows, it may as well remove the dead rows... no? My second question is, if vacuum is still needed, does it matter whether I run vacuum first or cluster first? Here's our current weekly db maintenance routine: 1. vacuum full 2. cluster 3. reindex 4. analyze Thanks, Jimmy Choi Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] "vacuum" and "cluster"
Hello, Does running "cluster" remove the need to run "vacuum"? I get a feeling that since cluster is already physically reordering the rows, it may as well remove the dead rows... no? My second question is, if vacuum is still needed, does it matter whether I run vacuum first or cluster first? Here's our current weekly db maintenance routine: 1. vacuum full 2. cluster 3. reindex 4. analyze Thanks, Jimmy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Guideline on use of temporary tables
I'm looking for general guideline on the use of temporary tables. I would like to use temporary table as a caching mechanism to speed up queries within the same session. Specifically, a temporary table is created to store a subset of data from a possibly large table, and subsequent queries select from the temporary table instead of re-applying the same complex filters on the actual table again and again. Is this what temporary table is designed for? Are there caveats that I should be aware of? Can you think of other better alternatives? Thank you very much. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Unexpected behavior with CASE statement
This will work for this particular example. But what if my case statement is more complicated than that? Example: select metric_type, case metric_type when 0 then sum (1 / val) when 1 then sum (val) when 2 then max (val) when 3 then min (val) end as result from metrics group by metric_type Thanks! On 10/3/07, Rodrigo De León <[EMAIL PROTECTED]> wrote: > On 10/3/07, Jimmy Choi <[EMAIL PROTECTED]> wrote: > > I expect to get the following result set: > > > > metric_type | result > > +--- > > 0 | 2 > > 1 | 3 > > Try: > > SELECT metric_type >, SUM(CASE metric_type >WHEN 0 > THEN 1 / val >WHEN 1 > THEN val > END) AS RESULT > FROM metrics > GROUP BY metric_type > ORDER BY metric_type > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Unexpected behavior with CASE statement
Suppose I have the following table named "metrics": metric_type | val +- 0 | 1 0 | 1 1 | 0 1 | 3 Now suppose I run the following simple query: select metric_type, case metric_type when 0 then sum (1 / val) when 1 then sum (val) end as result from metrics group by metric_type I expect to get the following result set: metric_type | result +--- 0 | 2 1 | 3 But in reality I get the following error: ERROR: division by zero SQL state: 22012 So it appears that Postgres executes all cases and select the result in the end. Is this expected behavior? Thanks - Jimmy ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Unexpected behavior with CASE statement
Suppose I have the following table named "metrics": metric_type | val +- 0 | 1 0 | 1 1 | 0 1 | 3 Now suppose I run the following simple query: select metric_type, case metric_type when 0 then sum (1 / val) when 1 then sum (val) end as result from metrics group by metric_type I expect to get the following result set: metric_type | result +--- 0 | 2 1 | 3 But in reality I get the following error: ERROR: division by zero SQL state: 22012 So it appears that Postgres executes all cases and select the result in the end. Is this expected behavior? Thanks - Jimmy Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Help tuning a large table off disk and into RAM
Have you tried clustering tables based on the most-frequently used indexes to improve locality? http://www.postgresql.org/docs/8.2/static/sql-cluster.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran Sent: Wednesday, September 26, 2007 11:24 AM To: James Williams Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Help tuning a large table off disk and into RAM In response to "James Williams" <[EMAIL PROTECTED]>: > I'm stuck trying to tune a big-ish postgres db and wondering if anyone > has any pointers. > > I cannot get Postgres to make good use of plenty of available RAM and > stop thrashing the disks. > > One main table. ~30 million rows, 20 columns all integer, smallint or > char(2). Most have an index. It's a table for holding webserver > logs. The main table is all foreign key ids. Row size is ~100bytes. > > The typical query is an aggregate over a large number of rows (~25% say). > > SELECT COUNT(*), COUNT(DISTINCT user_id) > FROM table > WHERE epoch > ... > AND epoch < ... > AND country = ... > > The box has 4 x Opterons, 4Gb RAM & five 15k rpm disks, RAID 5. We > wanted fast query/lookup. We know we can get fast disk IO. > > Running a typical query like above seems to: > > * hardly tax a single CPU > * plenty of RAM free > * disks thrash about > > The last is based mostly on the observation that another tiddly > unrelated mysql db which normally runs fast, grinds to a halt when > we're querying the postgres db (and cpu, memory appear to have spare > capacity). > > We've currently got these settings, and have tried doubling/halving > them, restarted and benchmarked a test query. They don't appear to > materially alter our query time. > > shared_buffers = 128MB shared_buffers = 1.5GB Unless you've got a lot of stuff other than PostgreSQL on this machine. > temp_buffers= 160MB > work_mem= 200MB > max_stack_depth = 7MB These look reasonable, although I can't be sure without more details. > > We're less concerned about insert speed. Typically 1 or 2 users, but > want fast queries. > > Perhaps a little extreme, but I'm trying to find a way to express this > in a way that Postgres understands: > > * Load this table, and one or two indexes (epoch, user_id) into RAM. Give it enough shared_buffers and it will do that. You're estimating the size of your table @ 3G (try a pg_relation_size() on it to get an actual size) If you really want to get _all_ of it in all the time, you're probably going to need to add RAM to the machine. With 8G, you could allocate about 3G to shared_buffers, but that would be ignoring the size of indexes. However, I think you'll be surprised how much performance improves with 1.5G of shared_buffers. You may not need any more. 128M is really forcing PG to work within limited space. > * All of the table, all of those indexes. > * Keep them there, but keep a disk based backup for integrity. > * Run all selects against the in RAM copy. Always. This is what PG does if you allocate enough shared_buffers. -- Bill Moran http://www.potentialtech.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/ Confidentiality Notice. This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution, or copying of this message, or any attachments, is strictly prohibited. If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments. Thank you. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[GENERAL] Short circuit evaluation of expressions in query
Say I have the following parameterized query in a function: select * from foo where ($1 = -1 or foo.status = $1) and (…) where the (…) part consists of more parameterized conditions similar to the first one. Suppose that at runtime, $1 is supplied a value of -1, does the foo.status = $1 condition still have to be evaluated? Assuming the $1 = -1 case, should the above query be as fast as the following? select * from foo where (…) Thanks Jimmy
[GENERAL] Multiple-index optimization not working for = ANY operator
Hello, From Section 11.4 of the Postgres 8.1 documentation, a new optimization is shipped in the latest release. "... a query like WHERE x = 42 OR x = 47 OR x = 53 OR x = 99 could be broken down into four separate scans of an index on x, each scan using one of the query clauses. The results of these scans are then ORed together to produce the result." While the feature works fine for cases such as: SELECT * FROM foo WHERE id IN (1, 2); And SELECT * FROM foo WHERE id = 1 OR id = 2; I find that it doesn't work (i.e. index is not used and a sequential scan is needed), if I have the following instead: SELECT * FROM foo WHERE id = ANY (ARRAY[1,2]); Is this expected? The reason I would like the last case to work is that my plpgsql function takes as input an array of IDs, and so I cannot write my query using the first two forms above. Any idea on how I can get around this is greatly appreciated. Thanks Jimmy ---(end of broadcast)--- TIP 6: explain analyze is your friend