Re: [GENERAL] Query plan for Merge Semi Join
Peter J. Holzer wrote: > [PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian > 4.9.2-10) 4.9.2, 64-bit] > > While investigating a performance issue, I found this query plan: > > wds=> explain analyze > select facttablename, columnname, term, concept_id, t.hidden, language, > register, c.id, canonicalname, description, parent, type, r.sortorder > from term t, concept c, relation r > where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen' > and exists (select 1 from facttable_kon_eh f where > f.arbeitsvolumen=t.term and thema in (values (E'E'))) > and c.id=concept_id and r.child=concept_id; > ╔╗ > ║ > QUERY PLAN > ║ > ╟╢ > ║ Nested Loop (cost=22984.13..241822.98 rows=1 width=169) (actual > time=667.608..3275.037 rows=6 loops=1) > ║ > ║ -> Nested Loop (cost=22983.70..241822.50 rows=1 width=154) (actual > time=667.587..3274.972 rows=6 loops=1) > ║ > ║ -> Merge Semi Join (cost=22983.27..241819.04 rows=1 width=87) > (actual time=667.559..3274.877 rows=6 loops=1) > ║ > ║ Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text) > > ║ > ║ -> Index Scan using term_term_idx on term t > (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 > loops=1) ║ > ║ Filter: (((facttablename)::text = > 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text)) >║ > ║ Rows Removed by Filter: 3874190 > > ║ > ║ -> Materialize (cost=0.43..399167.10 rows=3548798 width=7) > (actual time=0.086..295.708 rows=184791 loops=1) > ║ > ║ -> Nested Loop Semi Join (cost=0.43..390295.10 > rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1) > ║ > ║ Join Filter: ((f.thema)::text = ('E'::text)) > > ║ > ║ -> Index Scan using > facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f > (cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 > rows=184791 loops=1) ║ > ║ -> Materialize (cost=0.00..0.03 rows=1 > width=32) (actual time=0.000..0.000 rows=1 loops=184791) > ║ > ║ -> Result (cost=0.00..0.01 rows=1 > width=0) (actual time=0.001..0.001 rows=1 loops=1) > ║ > ║ -> Index Scan using concept_pkey on concept c (cost=0.43..3.45 > rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6) > ║ > ║ Index Cond: (id = t.concept_id) > > ║ > ║ -> Index Scan using relation_child_idx on relation r (cost=0.43..0.47 > rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6) >║ > ║ Index Cond: (child = c.id) > > ║ > ║ Planning time: 15.624 ms >
[GENERAL] Query plan for Merge Semi Join
[PostgreSQL 9.5.9 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit] While investigating a performance issue, I found this query plan: wds=> explain analyze select facttablename, columnname, term, concept_id, t.hidden, language, register, c.id, canonicalname, description, parent, type, r.sortorder from term t, concept c, relation r where facttablename='facttable_kon_eh' and columnname='arbeitsvolumen' and exists (select 1 from facttable_kon_eh f where f.arbeitsvolumen=t.term and thema in (values (E'E'))) and c.id=concept_id and r.child=concept_id; ╔╗ ║ QUERY PLAN ║ ╟╢ ║ Nested Loop (cost=22984.13..241822.98 rows=1 width=169) (actual time=667.608..3275.037 rows=6 loops=1) ║ ║ -> Nested Loop (cost=22983.70..241822.50 rows=1 width=154) (actual time=667.587..3274.972 rows=6 loops=1) ║ ║ -> Merge Semi Join (cost=22983.27..241819.04 rows=1 width=87) (actual time=667.559..3274.877 rows=6 loops=1) ║ ║ Merge Cond: ((t.term)::text = (f.arbeitsvolumen)::text) ║ ║ -> Index Scan using term_term_idx on term t (cost=0.56..206841.09 rows=18 width=87) (actual time=667.467..2929.430 rows=7 loops=1) ║ ║ Filter: (((facttablename)::text = 'facttable_kon_eh'::text) AND ((columnname)::text = 'arbeitsvolumen'::text)) ║ ║ Rows Removed by Filter: 3874190 ║ ║ -> Materialize (cost=0.43..399167.10 rows=3548798 width=7) (actual time=0.086..295.708 rows=184791 loops=1) ║ ║ -> Nested Loop Semi Join (cost=0.43..390295.10 rows=3548798 width=7) (actual time=0.076..278.006 rows=184791 loops=1) ║ ║ Join Filter: ((f.thema)::text = ('E'::text)) ║ ║ -> Index Scan using facttable_kon_eh_arbeitsvolumen_idx on facttable_kon_eh f (cost=0.43..337063.11 rows=3548798 width=9) (actual time=0.052..199.733 rows=184791 loops=1) ║ ║ -> Materialize (cost=0.00..0.03 rows=1 width=32) (actual time=0.000..0.000 rows=1 loops=184791) ║ ║ -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1) ║ ║ -> Index Scan using concept_pkey on concept c (cost=0.43..3.45 rows=1 width=67) (actual time=0.011..0.012 rows=1 loops=6) ║ ║ Index Cond: (id = t.concept_id) ║ ║ -> Index Scan using relation_child_idx on relation r (cost=0.43..0.47 rows=1 width=19) (actual time=0.008..0.009 rows=1 loops=6) ║ ║ Index Cond: (child = c.id) ║ ║ Planning time: 15.624 ms ║ ║ Execution time: 3275.341 ms
Re: [GENERAL] Query plan not updated after dropped index
On Thu, Feb 18, 2016 at 11:05 PM, Tom Lane wrote: > Victor Blomqvist writes: > > We just had a major issue on our databases, after a index was replaced a > > user defined function didnt change its query plan to use the new index. > > I'm suspicious that this is some variant of the problem discussed a couple > days ago: > > > http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com > > However, that theory requires that the index not have been immediately > usable, which implies that it initially had some broken HOT chains, > which really should not have happened if you were simply replacing one > index with an identical one. (The pre-existing index should've been > enough to ensure HOT chain consistency for its columns.) > > Perhaps you were doing something "cute" like replacing a single-column > index with a multi-column one? > No the new index looked exactly as the old one. The index was created with CONCURRENTLY, and we waited until it returned. I could use the index just fine when running a query in a separate connection, so I am not sure if it helps dropping the old index in a transaction and run a query to verify that the new index is usable? How can I know that the new index is usable from already open connections? Another thing to note about this index is that it sometimes has an unexpected zero page inside, resulting in this error: ERROR: index "user_pictures_picture_dhash_idx" contains unexpected zero page at block 123780. But it always happens sporadically on hour read slaves, and rerunning the query that cause it doesnt reproduce the error. I asked about this problem before, here: http://www.postgresql.org/message-id/flat/cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com#cal870dvxr9fhkyej5smydk4pjupl5kwabuhspbtqek03gdd...@mail.gmail.com > > regards, tom lane >
Re: [GENERAL] Query plan not updated after dropped index
Victor Blomqvist writes: > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. I'm suspicious that this is some variant of the problem discussed a couple days ago: http://www.postgresql.org/message-id/flat/CAKt_ZfsvdE3WiHUjg81MT1KzOkebqz_bg=dbhf4l6rkfjte...@mail.gmail.com However, that theory requires that the index not have been immediately usable, which implies that it initially had some broken HOT chains, which really should not have happened if you were simply replacing one index with an identical one. (The pre-existing index should've been enough to ensure HOT chain consistency for its columns.) Perhaps you were doing something "cute" like replacing a single-column index with a multi-column one? regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan not updated after dropped index
On 2/18/16, Victor Blomqvist wrote: > Hello! > > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. At > least this is our theory, since the function in question became much slower > and as a result brought our system to a halt. > > Basically it went: > 1. create new index (a simple btree on a bigint column index) > 2. drop old index > 3. rename new index to old index name > 4. analyze table > > After these steps normally our functions will update their plans and use > the new index just fine. However this time the function (only one function > use this particular index) seemed to take forever to complete. This is a > 40GB table so querying for something not indexed would take a long time. > Therefore my suspicion is that the function didnt start to use the new > index. My guess is that backends somehow cached generic plan[1] and didn't recalculate it. > Adding to the strangeness is that if I ran the function manually it was > fast, It is because _backends_ (processes) cache plans, not DBMS (i.e. they are not shared). So you connected to DB (making a new backend process), run the function; backend could not find cached plan and create it for itself. Plan for _your_ connection includes the new index. > only when called from our application through pg_bouncer it was slow. > I should also say that the function is only used on our 3 read slaves setup > to our database. > > Things we tried to fix this: > 1. Analyze table > 2. Restart our application > 3. Recreate the function > 4. Kill the slow running queries with pg_cancel_backend() > > These things did not help. Since pgbouncer reuses connections then backends processes still have cached plan. If you tried pg_terminate_backend() it could help. > Instead what helped in the end was to replace the function with an extra > useless where clause (in the hope that it would force it to create a new > plan) > > So, the function only have a single SELECT inside: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X > LIMIT 100 OFFSET 0; > > And this is my modification that made it work again: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X AND 1=1 > LIMIT 100 OFFSET 0; Yes, it is a new query for PG, and therefore it requires a new plan because it is not in a cache. > Obviously we are now worried why this happened Also my guess you did CREATE INDEX CONCURRENTLY and there is several cases[2] when it can not be used ("invalid" state or waiting for unfinished transactions). When the old index is dropped but the new index is not accessible while a query/function is running a generated plan does not include that index. And in case of caching such plan is caching and uses later without index too. > and how we can avoid it in > the future? We run Postgres 9.3 on CentOS 6. Firstly you can drop the old index not immediately but a little later depending on yours queries time. Also after creating the new index (and possible waiting a little) you can drop index in a _transaction_ and see whether the new index is used in an EXPLAIN of any query that use it or not. In the first case do COMMIT, in the second case just do ROLLBACK and leave old index for using. > Thanks! > Victor [1]http://www.postgresql.org/docs/devel/static/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING [2]http://www.postgresql.org/docs/devel/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY -- Best regards, Vitaly Burovoy -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan not updated after dropped index
The end goal is to get rid of index bloat. If there is a better way to handle this Im all ears! /Victor On Thu, Feb 18, 2016 at 5:21 PM, Oleg Bartunov wrote: > > > On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist wrote: > >> Hello! >> >> We just had a major issue on our databases, after a index was replaced a >> user defined function didnt change its query plan to use the new index. At >> least this is our theory, since the function in question became much slower >> and as a result brought our system to a halt. >> >> Basically it went: >> 1. create new index (a simple btree on a bigint column index) >> 2. drop old index >> 3. rename new index to old index name >> > > why do you did this !? > > >> 3. analyze table >> >> After these steps normally our functions will update their plans and use >> the new index just fine. However this time the function (only one function >> use this particular index) seemed to take forever to complete. This is a >> 40GB table so querying for something not indexed would take a long time. >> Therefor my suspicion is that the function didnt start to use the new index. >> >> Adding to the strangeness is that if I ran the function manually it was >> fast, only when called from our application through pg_bouncer it was slow. >> I should also say that the function is only used on our 3 read slaves setup >> to our database. >> >> Things we tried to fix this: >> 1. Analyze table >> 2. Restart our application >> 3. Recreate the function >> 4. Kill the slow running queries with pg_cancel_backend() >> >> These things did not help. >> >> Instead what helped in the end was to replace the function with an extra >> useless where clause (in the hope that it would force it to create a new >> plan) >> >> So, the function only have a single SELECT inside: >> RETURN QUERY >> SELECT * FROM table >> WHERE bigint_column = X >> LIMIT 100 OFFSET 0; >> >> And this is my modification that made it work again: >> RETURN QUERY >> SELECT * FROM table >> WHERE bigint_column = X AND 1=1 >> LIMIT 100 OFFSET 0; >> >> >> Obviously we are now worried why this happened and how we can avoid it in >> the future? We run Postgres 9.3 on CentOS 6. >> >> Thanks! >> Victor >> > >
Re: [GENERAL] Query plan not updated after dropped index
On Thu, Feb 18, 2016 at 11:17 AM, Victor Blomqvist wrote: > Hello! > > We just had a major issue on our databases, after a index was replaced a > user defined function didnt change its query plan to use the new index. At > least this is our theory, since the function in question became much slower > and as a result brought our system to a halt. > > Basically it went: > 1. create new index (a simple btree on a bigint column index) > 2. drop old index > 3. rename new index to old index name > why do you did this !? > 3. analyze table > > After these steps normally our functions will update their plans and use > the new index just fine. However this time the function (only one function > use this particular index) seemed to take forever to complete. This is a > 40GB table so querying for something not indexed would take a long time. > Therefor my suspicion is that the function didnt start to use the new index. > > Adding to the strangeness is that if I ran the function manually it was > fast, only when called from our application through pg_bouncer it was slow. > I should also say that the function is only used on our 3 read slaves setup > to our database. > > Things we tried to fix this: > 1. Analyze table > 2. Restart our application > 3. Recreate the function > 4. Kill the slow running queries with pg_cancel_backend() > > These things did not help. > > Instead what helped in the end was to replace the function with an extra > useless where clause (in the hope that it would force it to create a new > plan) > > So, the function only have a single SELECT inside: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X > LIMIT 100 OFFSET 0; > > And this is my modification that made it work again: > RETURN QUERY > SELECT * FROM table > WHERE bigint_column = X AND 1=1 > LIMIT 100 OFFSET 0; > > > Obviously we are now worried why this happened and how we can avoid it in > the future? We run Postgres 9.3 on CentOS 6. > > Thanks! > Victor >
[GENERAL] Query plan not updated after dropped index
Hello! We just had a major issue on our databases, after a index was replaced a user defined function didnt change its query plan to use the new index. At least this is our theory, since the function in question became much slower and as a result brought our system to a halt. Basically it went: 1. create new index (a simple btree on a bigint column index) 2. drop old index 3. rename new index to old index name 3. analyze table After these steps normally our functions will update their plans and use the new index just fine. However this time the function (only one function use this particular index) seemed to take forever to complete. This is a 40GB table so querying for something not indexed would take a long time. Therefor my suspicion is that the function didnt start to use the new index. Adding to the strangeness is that if I ran the function manually it was fast, only when called from our application through pg_bouncer it was slow. I should also say that the function is only used on our 3 read slaves setup to our database. Things we tried to fix this: 1. Analyze table 2. Restart our application 3. Recreate the function 4. Kill the slow running queries with pg_cancel_backend() These things did not help. Instead what helped in the end was to replace the function with an extra useless where clause (in the hope that it would force it to create a new plan) So, the function only have a single SELECT inside: RETURN QUERY SELECT * FROM table WHERE bigint_column = X LIMIT 100 OFFSET 0; And this is my modification that made it work again: RETURN QUERY SELECT * FROM table WHERE bigint_column = X AND 1=1 LIMIT 100 OFFSET 0; Obviously we are now worried why this happened and how we can avoid it in the future? We run Postgres 9.3 on CentOS 6. Thanks! Victor
Re: [GENERAL] Query plan for currently executing query?
On Wed, 18 Sep 2013 16:24:16 +0100 Tim Kane wrote: > > > On 18/09/2013 14:44, "Rowan Collins" wrote: > > > > >Running an EXPLAIN on the exact query should give you the plan. An > >EXPLAIN ANALYZE would have to wait for the query to complete either > >way, so you wouldn't be able to get it mid-way through a running > >process. > > > >If you don't know the exact query running, then if the > >stats_command_string config setting is on, you should be able to get > >it by querying the pg_stat_activity view. > > > >The only other problem I can think of is if there are temporary > >tables or other session-specific objects that would exist only > >within the running process. > > > >Another possibility to consider (which would also show up in > >pg_stat_activity or similar views) is that the query is waiting on > >some kind of lock, rather than just executing slowly. > > > I think the OP was hoping for a solution that would allow him to > retrieve the query plan that was generated at execution time. I've > certainly wished for this type of facility in the past, in the > scenario where a given table may have been ANALYZE in the interim - > making it difficult to perform any useful diagnostics on the problem > query. > Maybe auto-explain is the solution? http://www.postgresql.org/docs/9.1/static/auto-explain.html -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan for currently executing query?
On 18/09/2013 14:44, "Rowan Collins" wrote: > >Running an EXPLAIN on the exact query should give you the plan. An >EXPLAIN ANALYZE would have to wait for the query to complete either way, >so you wouldn't be able to get it mid-way through a running process. > >If you don't know the exact query running, then if the >stats_command_string config setting is on, you should be able to get it >by querying the pg_stat_activity view. > >The only other problem I can think of is if there are temporary tables >or other session-specific objects that would exist only within the >running process. > >Another possibility to consider (which would also show up in >pg_stat_activity or similar views) is that the query is waiting on some >kind of lock, rather than just executing slowly. I think the OP was hoping for a solution that would allow him to retrieve the query plan that was generated at execution time. I've certainly wished for this type of facility in the past, in the scenario where a given table may have been ANALYZE in the interim - making it difficult to perform any useful diagnostics on the problem query. In cases such as that, it is sometimes better to cancel the currently running query and restart it in order to take advantage of the newer statistics and generate a better QEP. What might be a useful feature addition, is if at the time of the ANALYZE operation - postgres could identify those currently running queries and flag them as having begun execution with a dirty/stale QEP. Possibly this could be extended to heuristically determine if a query might return faster if it were cancelled and re-executed under a new QEP? Tim -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan for currently executing query?
François Beausoleil wrote (on 18/09/2013): Hi! Is it possible to get the plan of a query that's currently running? I have queries which normally take about 15 minutes, but are now at 2+ hours. I forgot to add the code to dump the plan to STDOUT, so I really don't know. It might be data volume, but I want to confirm that the plan isn't bogus. I'm doing batch imports of data on "PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit". Thanks, François Beausoleil Running an EXPLAIN on the exact query should give you the plan. An EXPLAIN ANALYZE would have to wait for the query to complete either way, so you wouldn't be able to get it mid-way through a running process. If you don't know the exact query running, then if the stats_command_string config setting is on, you should be able to get it by querying the pg_stat_activity view. The only other problem I can think of is if there are temporary tables or other session-specific objects that would exist only within the running process. Another possibility to consider (which would also show up in pg_stat_activity or similar views) is that the query is waiting on some kind of lock, rather than just executing slowly. Regards, -- Rowan Collins [IMSoP] -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Query plan for currently executing query?
Hi! Is it possible to get the plan of a query that's currently running? I have queries which normally take about 15 minutes, but are now at 2+ hours. I forgot to add the code to dump the plan to STDOUT, so I really don't know. It might be data volume, but I want to confirm that the plan isn't bogus. I'm doing batch imports of data on "PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit". Thanks, François Beausoleil smime.p7s Description: S/MIME cryptographic signature
Re: [GENERAL] Query plan different depending on the value of where filter
"Looby, Denis" writes: > What I don't understand is why the plan is different just because the > group_id = has changed value? > Does the planner have some statistical info on the contents of non-indexed > rows? Of course. In this case it knows that a nestloop would be a loser because many rows would be fetched from the outer table. > I don't quite understand why this plan executes the sequential scan once, > whereas the slow one does it 5001 times, which I believe is the main source > of the difference. In the hash join case, it builds the hash table, discovers that it's empty (because there are no rows with type = 2), and concludes that it need not scan the outer relation. This is a corner case that the planner doesn't try to account for because it couldn't be predicted reliably. I wouldn't get too excited about it if I were you, because presumably the case isn't going to happen all that much in production either (else why are you bothering with the query?). Personally I'd suggest getting rid of the WHERE outer_tab.outer_key = inner_tab_1.key_to clause, which is entirely redundant with the IN condition and is preventing the planner from seeing that it could optimize the IN into a semijoin. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Query plan different depending on the value of where filter
Hi All, I have an interesting query scenario I'm trying to understand. I came across this while investigating a slow query in our application. I've been able to reproduce the scenario in a psql script that sets up the tables and runs the queries. Script here http://pastebin.com/CBkdDmWp if anyone is interested. This is the scenario. Version is "PostgreSQL 9.2.4, compiled by Visual C++ build 1600, 32-bit" Three tables Outer_tab : 5000 rows Inner_tab_1 : 1000 rows Inner_tab_2 : 16000 rows This is the query SELECT outer_tab.outer_key FROM outer_tab WHERE outer_tab.outer_key IN ( SELECT inner_tab_1.key_to FROM inner_tab_2 INNER JOIN inner_tab_1 ON (inner_tab_2.outer_key = inner_tab_1.key_from AND inner_tab_1.type = 2) WHERE outer_tab.outer_key = inner_tab_1.key_to AND inner_tab_2.group_id = 9 ); Two important things here, type = 2 does not occur in inner_tab_1 and group_id = 9 does not occur in inner_tab_2 and group_id is not indexed. The result is 0 rows. Now this is quite slow about 15 seconds on my machine. Here is the explain plan http://explain.depesz.com/s/BVg I understand that the seq scan on inner_tab_2 and its 16000 rows is the culprit and the easy fix is to swap inner_tab_2 and inner_tab_1 between the FROM and the JOIN. This lets it drive off inner_tab_1 with an index scan and skip the sequential scan as seen here http://explain.depesz.com/s/pkG Much better at 14ms. That's fine but what has me somewhat confused is if group_id in the WHERE is changed to 1, which does exist in inner_tab_2, we get quite a different plan. http://explain.depesz.com/s/FX4 It's quick too 63ms What I don't understand is why the plan is different just because the group_id = has changed value? Does the planner have some statistical info on the contents of non-indexed rows? I don't quite understand why this plan executes the sequential scan once, whereas the slow one does it 5001 times, which I believe is the main source of the difference. Also if I don't ANALYZE the tables the original query will run in a few ms instead of 15 seconds, it actually uses the same query plan that swapping the tables creates. So it runs the index scan on inner_tab_1 first. It's a bit surprising that with ANALYSE it picks a plan that is so much worse. Any one able to shed some light? Thanks for your time, Denis Looby
Re: [GENERAL] Query plan optimization: sorting vs. partitioning
2011/2/2, Tom Lane : >> I see the database doesn't understand that there are no entries in the >> main table, so it has to assume the Append data is not ordered. Is >> there a way to avoid sorting? > > No. In existing releases there is no plan type that can produce > presorted output from an append relation (ie, an inheritance tree). > 9.1 will be able to do that, but it wasn't exactly a small fix: > http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8 OK, I hope I'll be able to come up with a stored procedure to query the tables directly, then. Thanks! -- DoubleF -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan optimization: sorting vs. partitioning
Sergey Zaharchenko writes: > I need to select some data in the time order. When I query a separate > smaller table, the index is used an no sorting is needed. However, > when I query the main table, it occurs: > ... > -> Sort ... > Sort Key: ... > Sort Method: ... > -> Result ... >-> Append ... > -> Seq Scan on states >Filter: ... > -> Seq Scan on states_20101206 >Filter: ... > ... > I see the database doesn't understand that there are no entries in the > main table, so it has to assume the Append data is not ordered. Is > there a way to avoid sorting? No. In existing releases there is no plan type that can produce presorted output from an append relation (ie, an inheritance tree). 9.1 will be able to do that, but it wasn't exactly a small fix: http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=11cad29c91524aac1d0b61e0ea0357398ab79bf8 regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Query plan optimization: sorting vs. partitioning
Hello list, I have a large time-indexed table (states) partitioned into several tables based on the date. The smaller tables are clustered by their time indices.The main table is empty. I need to select some data in the time order. When I query a separate smaller table, the index is used an no sorting is needed. However, when I query the main table, it occurs: ... -> Sort ... Sort Key: ... Sort Method: ... -> Result ... -> Append ... -> Seq Scan on states Filter: ... -> Seq Scan on states_20101206 Filter: ... ... I see the database doesn't understand that there are no entries in the main table, so it has to assume the Append data is not ordered. Is there a way to avoid sorting? Please CC me as I'm not on the list. Thanks in advance, -- DoubleF -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
On 14/09/2010, at 10:37 AM, Yaroslav Tykhiy wrote: On 14/09/2010, at 12:41 AM, Tom Lane wrote: Yaroslav Tykhiy writes: [...] I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead: [...] Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable. Can it mean that the default planner configuration slightly overfavours seq scans? Funnily, after a few days of running with random_page_cost=2, exactly the same query became slow again and I had to reduce random_page_cost further to 1.5 to make it fast. Can it be a sign of a problem in the planner? Thanks! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
Hi Tom, On 14/09/2010, at 12:41 AM, Tom Lane wrote: Yaroslav Tykhiy writes: -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. Well, that substring() and ILIKE combo looked suspicious to me, too. However, there already was an index on substring(v.headervalue, 0, 255) but the fast query plan didn't seem to use it, it used a different index instead: mail=# \d dbmail_headervalue Table "public.dbmail_headervalue" Column | Type | Modifiers + +--- headername_id | bigint | not null physmessage_id | bigint | not null id | bigint | not null default nextval('dbmail_headervalue_idnr_seq'::regclass) headervalue| text | not null default ''::text Indexes: "dbmail_headervalue_pkey" PRIMARY KEY, btree (id) "dbmail_headervalue_1" UNIQUE, btree (physmessage_id, id) "dbmail_headervalue_2" btree (physmessage_id) "dbmail_headervalue_3" btree ("substring"(headervalue, 0, 255)) ... EXPLAIN ANALYZE... -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) ... Meanwhile, a mate of mine lurking on this list pointed out that reducing random_page_cost might help here and it did: random_page_cost of 2 made the fast query favourable. Can it mean that the default planner configuration slightly overfavours seq scans? Thank you all guys! Yar -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
Yaroslav Tykhiy writes: > -> Bitmap Heap Scan on dbmail_headervalue v > (cost=1409.82..221813.70 rows=2805 width=16) (actual > time=28543.411..28623.623 rows=1 loops=1) > Recheck Cond: (v.headername_id = n.id) > Filter: ("substring"(v.headervalue, 0, > 255) ~~* '%<@mail.gmail.com>%'::text) > -> Bitmap Index Scan on > dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) > (actual time=17555.572..17555.572 rows=1877009 loops=1) > Index Cond: (v.headername_id = n.id) I think the major problem you're having is that the planner is completely clueless about the selectivity of the condition "substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%' If it knew that that would match only one row, instead of several thousand, it would likely pick a different plan. In recent versions of PG you could probably make a noticeable improvement in this if you just dropped the substring() restriction ... do you actually need that? Alternatively, if you don't want to change the query logic at all, I'd try making an index on substring(v.headervalue, 0, 255). I'm not expecting the query to actually *use* the index, mind you. But its existence will prompt ANALYZE to collect stats on the expression's value, and that will help the planner with estimating the ~~* condition. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Query plan choice issue
Hi Martin, Thank you for your response! On 13/09/2010, at 10:49 AM, Martin Gainty wrote: a cursory look of the plan details a FTS on dbmail_headername invoked by the JOIN clause JOIN dbmail_headername n ON v.headername_id=n.id you would accelerate the seek appreciably by placing indexes on both participating columns v.headername_id n.id Granted, there was no index on v.headername_id but creating one just slowed the query down, with a different plan: CREATE INDEX dbmail_headervalue_testing ON dbmail_headervalue (headername_id); EXPLAIN ANALYSE ... QUERY PLAN Sort (cost=222020.81..222020.81 rows=1 width=8) (actual time=28636.426..28636.426 rows=0 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1409.82..222020.80 rows=1 width=8) (actual time=28636.409..28636.409 rows=0 loops=1) -> Nested Loop (cost=1409.82..222012.27 rows=1 width=24) (actual time=28636.405..28636.405 rows=0 loops=1) -> Nested Loop (cost=1409.82..221959.94 rows=6 width=8) (actual time=28543.441..28624.750 rows=1 loops=1) -> Seq Scan on dbmail_headername n (cost=0.00..111.17 rows=1 width=8) (actual time=0.022..1.114 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Bitmap Heap Scan on dbmail_headervalue v (cost=1409.82..221813.70 rows=2805 width=16) (actual time=28543.411..28623.623 rows=1 loops=1) Recheck Cond: (v.headername_id = n.id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Bitmap Index Scan on dbmail_headervalue_testing (cost=0.00..1409.82 rows=75940 width=0) (actual time=17555.572..17555.572 rows=1877009 loops=1) Index Cond: (v.headername_id = n.id) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.71 rows=1 width=16) (actual time=11.646..11.646 rows=0 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.52 rows=1 width=8) (never executed) Index Cond: (p.id = m.physmessage_id) Total runtime: 28636.517 ms (19 rows) I also see a FTS on domain_headervalue invoked by the JOIN cluase JOIN dbmail_headervalue v ON v.physmessage_id=p.id place indexes on both columns v.physmessage_id p.id Both columns already indexed here: On public.dbmail_headervalue (alias v): "dbmail_headervalue_2" btree (physmessage_id) On public.dbmail_physmessage (alias p): "dbmail_physmessage_pkey" PRIMARY KEY, btree (id) Perhaps I should provide some data on the table sizes. dbmail_headervalue is the largest table with respect to its record count: 36 million records. dbmail_headername is small: 5640 records. dbmail_physmessage and dbmail_messages are of an average size: ~2 million records each. Sorry for my cluelessness on this issue. But Postgresql's ability to build a fast query plan for this query type at least occasionally is encouraging. :-) Yar the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed for both participants Martin __ Verzicht und Vertraulichkeitanmerkung Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. From: y...@barnet.com.au To: pgsql-general@postgresql.org Subject: [GENERAL] Query plan choice issue Date: Mon, 13 Sep 2010 09:36:35 +1000 Hi all, I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can be executed using either of two different query plans, one taking a few milliseconds and the other, tens of seconds. The work_mem setting doesn't seem to affect it -- tried to increase or decrease it by 2 or 4 times, but it didn't seem to favour the fast plan choice. Honestly, I have no idea what affects the plan choice, but I saw Postgresql change it at random. The query in question looks like this -- sorry, it's
Re: [GENERAL] Query plan choice issue
a cursory look of the plan details a FTS on dbmail_headername invoked by the JOIN clause JOIN dbmail_headername n ON v.headername_id=n.id you would accelerate the seek appreciably by placing indexes on both participating columns v.headername_id n.id I also see a FTS on domain_headervalue invoked by the JOIN cluase JOIN dbmail_headervalue v ON v.physmessage_id=p.id place indexes on both columns v.physmessage_id p.id the join clause JOIN dbmail_physmessage p ON m.physmessage_id=p.id uses indexed for both participants Martin __ Verzicht und Vertraulichkeitanmerkung Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen. > From: y...@barnet.com.au > To: pgsql-general@postgresql.org > Subject: [GENERAL] Query plan choice issue > Date: Mon, 13 Sep 2010 09:36:35 +1000 > > Hi all, > > I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can > be executed using either of two different query plans, one taking a > few milliseconds and the other, tens of seconds. The work_mem setting > doesn't seem to affect it -- tried to increase or decrease it by 2 or > 4 times, but it didn't seem to favour the fast plan choice. Honestly, > I have no idea what affects the plan choice, but I saw Postgresql > change it at random. > > The query in question looks like this -- sorry, it's rather complex: > > SELECT message_idnr > FROM dbmail_messages m > JOIN dbmail_physmessage p ON m.physmessage_id=p.id > JOIN dbmail_headervalue v ON v.physmessage_id=p.id > JOIN dbmail_headername n ON v.headername_id=n.id > WHERE mailbox_idnr = 12345 AND status IN (0,1) AND > headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) > ILIKE '%<@mail.gmail.com>%' > ORDER BY message_idnr; > > It comes from DBMail. That said, I don't think DBMail can be blamed > here because at least sometimes Postgresql is able to do the right > thing with respect to that query. > > Here is the slow plan: > QUERY > PLAN > -- > Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual > time=20933.166..20933.168 rows=1 loops=1) > Sort Key: m.message_idnr > Sort Method: quicksort Memory: 25kB > -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual > time=3327.658..20933.122 rows=1 loops=1) > -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) > (actual time=3327.599..20933.056 rows=1 loops=1) > -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) > (actual time=3327.503..20932.944 rows=1 loops=1) > Join Filter: (v.headername_id = n.id) > -> Seq Scan on dbmail_headername n > (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 > loops=1) > Filter: ((headername)::text ~~* 'MESSAGE- > ID'::text) > -> Seq Scan on dbmail_headervalue v > (cost=0.00..1234407.96 rows=2666 width=16) (actual > time=3327.465..20931.942 rows=1 loops=1) > Filter: ("substring"(v.headervalue, 0, > 255) ~~* '%<@mail.gmail.com>%'::text) > -> Index Scan using dbmail_messages_physmessage_idx on > dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual > time=0.088..0.100 rows=1 loops=1) > Index Cond: (m.physmessage_id = v.physmessage_id) > Filter: ((m.status = ANY ('{0,1}'::integer[])) > AND (m.mailbox_idnr = 12345)) > -> Index Scan using dbmail_physmessage_pkey on > dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual > time=0.048..0.050 rows=1 loops=1) > Index Cond: (p.id = m.physmessage_id) > Total runtime: 20933.241 ms > (17 rows) > > And here is the fast plan: > QUERY > PLAN > --- > Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual > time=14.163..14.165 rows=1 loops=1) > Sort Key: m.message_idnr > Sort Method: quicksort Memory: 25kB > -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual > time=4.272..14.152 rows=1 loops=1) > -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) > (actual time=4.261..14.135 rows=1 loops=1) > -> Nested Loop (cost=0.00..1175268.46 rows=1 w
[GENERAL] Query plan choice issue
Hi all, I'm seeing a funny behaviour in Postgresql 8.4.4. Namely, a query can be executed using either of two different query plans, one taking a few milliseconds and the other, tens of seconds. The work_mem setting doesn't seem to affect it -- tried to increase or decrease it by 2 or 4 times, but it didn't seem to favour the fast plan choice. Honestly, I have no idea what affects the plan choice, but I saw Postgresql change it at random. The query in question looks like this -- sorry, it's rather complex: SELECT message_idnr FROM dbmail_messages m JOIN dbmail_physmessage p ON m.physmessage_id=p.id JOIN dbmail_headervalue v ON v.physmessage_id=p.id JOIN dbmail_headername n ON v.headername_id=n.id WHERE mailbox_idnr = 12345 AND status IN (0,1) AND headername ILIKE 'MESSAGE-ID' AND SUBSTRING(headervalue,0,255) ILIKE '%<@mail.gmail.com>%' ORDER BY message_idnr; It comes from DBMail. That said, I don't think DBMail can be blamed here because at least sometimes Postgresql is able to do the right thing with respect to that query. Here is the slow plan: QUERY PLAN -- Sort (cost=1234610.11..1234610.11 rows=1 width=8) (actual time=20933.166..20933.168 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1234610.10 rows=1 width=8) (actual time=3327.658..20933.122 rows=1 loops=1) -> Nested Loop (cost=0.00..1234601.59 rows=1 width=24) (actual time=3327.599..20933.056 rows=1 loops=1) -> Nested Loop (cost=0.00..1234549.74 rows=6 width=8) (actual time=3327.503..20932.944 rows=1 loops=1) Join Filter: (v.headername_id = n.id) -> Seq Scan on dbmail_headername n (cost=0.00..108.46 rows=1 width=8) (actual time=0.027..0.985 rows=1 loops=1) Filter: ((headername)::text ~~* 'MESSAGE- ID'::text) -> Seq Scan on dbmail_headervalue v (cost=0.00..1234407.96 rows=2666 width=16) (actual time=3327.465..20931.942 rows=1 loops=1) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Index Scan using dbmail_messages_physmessage_idx on dbmail_messages m (cost=0.00..8.63 rows=1 width=16) (actual time=0.088..0.100 rows=1 loops=1) Index Cond: (m.physmessage_id = v.physmessage_id) Filter: ((m.status = ANY ('{0,1}'::integer[])) AND (m.mailbox_idnr = 12345)) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.048..0.050 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 20933.241 ms (17 rows) And here is the fast plan: QUERY PLAN --- Sort (cost=1175284.93..1175284.94 rows=1 width=8) (actual time=14.163..14.165 rows=1 loops=1) Sort Key: m.message_idnr Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=0.00..1175284.92 rows=1 width=8) (actual time=4.272..14.152 rows=1 loops=1) -> Nested Loop (cost=0.00..1175276.41 rows=1 width=24) (actual time=4.261..14.135 rows=1 loops=1) -> Nested Loop (cost=0.00..1175268.46 rows=1 width=32) (actual time=4.249..14.117 rows=1 loops=1) -> Index Scan using dbmail_messages_mailbox_idx on dbmail_messages m (cost=0.00..4153.35 rows=786 width=16) (actual time=0.043..2.810 rows=358 loops=1) Index Cond: (mailbox_idnr = 12345) Filter: (status = ANY ('{0,1}'::integer[])) -> Index Scan using dbmail_headervalue_2 on dbmail_headervalue v (cost=0.00..1489.96 rows=1 width=16) (actual time=0.028..0.029 rows=0 loops=358) Index Cond: (v.physmessage_id = m.physmessage_id) Filter: ("substring"(v.headervalue, 0, 255) ~~* '%<@mail.gmail.com>%'::text) -> Index Scan using dbmail_headername_pkey on dbmail_headername n (cost=0.00..7.94 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1) Index Cond: (n.id = v.headername_id) Filter: ((n.headername)::text ~~* 'MESSAGE- ID'::text) -> Index Scan using dbmail_physmessage_pkey on dbmail_physmessage p (cost=0.00..8.50 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1) Index Cond: (p.id = m.physmessage_id) Total runtime: 14.231 ms (18 row
Re: [GENERAL] query plan in pg7.4 vs 8.0.3
Albert Vernon Smith <[EMAIL PROTECTED]> writes: > I just re-checked, and all the same index definitions are on both > tables. Well, that seems to imply that the 8.0 planner is missing the plan that 7.4 finds ... which is both hard to believe and impossible to investigate with this much info. Would it be possible to get a copy of the database? Or at least a schema dump? regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] query plan in pg7.4 vs 8.0.3
I just re-checked, and all the same index definitions are on both tables. Cheers, -albert On 17.6.2005, at 14:20, Tom Lane wrote: Albert Vernon Smith <[EMAIL PROTECTED]> writes: I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3. I dumped and reloaded the data, and things look good, until I try to work with it. When I try to make a query to a view based on a large join (select * from 'view' limit 10), the 7.4.3 query works very well, and the indexes are being used well. But, when I go to 8.0.3, there are some very costly joins, and the query is not workable. Are you sure you have all the same indexes in the 8.0 database? It looks like one might be missing on assay2rs.assay_id. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [GENERAL] query plan in pg7.4 vs 8.0.3
Albert Vernon Smith <[EMAIL PROTECTED]> writes: > I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3. I > dumped and reloaded the data, and things look good, until I try to > work with it. When I try to make a query to a view based on a large > join (select * from 'view' limit 10), the 7.4.3 query works very > well, and the indexes are being used well. But, when I go to 8.0.3, > there are some very costly joins, and the query is not workable. Are you sure you have all the same indexes in the 8.0 database? It looks like one might be missing on assay2rs.assay_id. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] query plan in pg7.4 vs 8.0.3
I am in process of migrating a database from Pg 7.4.3 to Pg 8.0.3. I dumped and reloaded the data, and things look good, until I try to work with it. When I try to make a query to a view based on a large join (select * from 'view' limit 10), the 7.4.3 query works very well, and the indexes are being used well. But, when I go to 8.0.3, there are some very costly joins, and the query is not workable. (I've put the 'EXPLAIN' results below for the same database, and the same query.) Also, the memory settings have been made the same for 7.4 vs. 8.0. I should point out that some of the indexes (and joins) are based on the type 'citext'. (http://gborg.postgresql.org/project/citext/ projdisplay.php). I did compile and add this to my Pg 8 installation, but I am concerned that this might be the root problem. There seems to be no update on this work since sometime in 2003, and it might not be appropriate to use in 8.0.3. But, it seems that indexes are being used when I do searches on the individual tables. So, I'm a bit stumped at the moment. So, I'd like to ask advice on my query, and what might be the cause. If 'citext' is the culprit, how can I migrate a very large database which uses this datatype extensively. Any help is greatly appreciated. Thanks, -albert === In Pg 8.0.3-- EXPLAIN select * from snp_quality_overview limit 10; QUERY PLAN - Limit (cost=83734.45..84167.07 rows=10 width=415) -> Hash Join (cost=83734.45..10949200.55 rows=251154 width=415) Hash Cond: ("outer".panel_id = "inner".panel_id) -> Nested Loop (cost=83733.39..10924084.09 rows=251154 width=359) -> Nested Loop (cost=83733.39..5685331.85 rows=253240 width=324) -> Hash Join (cost=83733.39..3840930.37 rows=423746 width=291) Hash Cond: ("outer".assay_id = "inner".assay_id) -> Nested Loop (cost=862.47..3680521.72 rows=602829 width=259) -> Hash Join (cost=862.47..258437.41 rows=602829 width=195) Hash Cond: ("outer".genotype_submission_id = "inner".genotype_submission_id) -> Seq Scan on genotype_set gs (cost=0.00..218413.11 rows=6626211 width=117) -> Hash (cost=861.68..861.68 rows=316 width=94) -> Nested Loop (cost=450.77..861.68 rows=316 width=94) Join Filter: ("inner".lsid = "outer".prot_lsid) -> Merge Join (cost=446.02..671.68 rows=358 width=83) Merge Cond: ("outer".batch_file_id = "inner".batch_file_id) -> Index Scan using batch_file_id3 on genotype_submission gsb (cost=0.00..210.81 rows=3472 width=75) -> Sort (cost=446.02..447.31 rows=517 width=20) Sort Key: batch_file.batch_file_id -> Merge Join (cost=23.66..422.72 rows=517 width=20) Merge Cond: ("outer".monthly_batch_id = "inner".monthly_batch_id) Join Filter: ("outer".center_name = "inner".center_name) -> Index Scan using monthly_batch_id on batch_file (cost=0.00..298.59 rows=5024 width=24) -> Sort (cost=23.66..24.02 rows=145 width=19) Sort Key: monthly_batch.monthly_batch_id -> Seq Scan on monthly_batch (cost=0.00..18.45 rows=145 width=19) -> Materialize (cost=4.74..4.97 rows=23 width=72) -> Hash Join (cost=1.10..4.72 rows=23 width=72) Hash Cond: ("outer".genotyping_platform_id = "inner".genotyping_platform_id) -> Seq Scan on protocol p (cost=0.00..3.26 rows=26 width=65) -> Hash (cost=1.08..1.08 rows=8
Re: [GENERAL] Query plan question
Thanks Tom, works, have to test performance -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Thursday, June 26, 2003 7:36 AM To: Maksim Likharev Cc: GENERAL Subject: Re: [GENERAL] Query plan question "Maksim Likharev" <[EMAIL PROTECTED]> writes: > basically I complaining that PG does not do what I told to do or > was hoping to do. Okay, now I get the point: you want to prevent the "pt" sub-select from being flattened into the outer query. 7.3.1 through 7.3.3 will actually do what you want (they won't flatten a sub-select that has any sub-selects in its output list) but we got a lot of flak for that and 7.4 will go back to the prior behavior. In most scenarios it's a win for the planner to flatten wherever possible. Probably the easiest way to handle it is to insert a DISTINCT or LIMIT clause in the sub-select; that will unconditionally keep the planner from flattening the sub-select. For example, ... FROM prod.t_p AS p INNER JOIN t_temp AS t ON p.did = t.did LEFT OUTER JOIN prod.t_pinv AS pi ON p.kid = pi.kid AND pi.orderid = 'S' -- hack to keep this separate from outer plan: OFFSET 0 ) AS pt LEFT OUTER JOIN prod.t_dmp AS pdb ON pt.kid = pdb.kid ... I don't foresee any future planner changes that would be likely to bypass a LIMIT/OFFSET clause. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] Query plan question
"Maksim Likharev" <[EMAIL PROTECTED]> writes: > basically I complaining that PG does not do what I told to do or > was hoping to do. Okay, now I get the point: you want to prevent the "pt" sub-select from being flattened into the outer query. 7.3.1 through 7.3.3 will actually do what you want (they won't flatten a sub-select that has any sub-selects in its output list) but we got a lot of flak for that and 7.4 will go back to the prior behavior. In most scenarios it's a win for the planner to flatten wherever possible. Probably the easiest way to handle it is to insert a DISTINCT or LIMIT clause in the sub-select; that will unconditionally keep the planner from flattening the sub-select. For example, ... FROM prod.t_p AS p INNER JOIN t_temp AS t ON p.did = t.did LEFT OUTER JOIN prod.t_pinv AS pi ON p.kid = pi.kid AND pi.orderid = 'S' -- hack to keep this separate from outer plan: OFFSET 0 ) AS pt LEFT OUTER JOIN prod.t_dmp AS pdb ON pt.kid = pdb.kid ... I don't foresee any future planner changes that would be likely to bypass a LIMIT/OFFSET clause. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] Query plan: varchar vs char indexes
"Richard Huxton" <[EMAIL PROTECTED]> writes: > With indexed varchar fields the explain changes - performing a seq-scan on > users rather than using the index. > Is this because the estimator guesses costs differently for char vs varchar > or is it because I'm not explicitly casting the id fields? The estimator has no special ideas about either char or varchar. However there are some discrepancies in the sets of available functions for the two datatypes, so what appears to be the same expression may translate into different function invocations --- especially if you are doing random combinations of datatypes and expecting the system to pick an operator for you. I suspect it is picking a combination that doesn't work out to be semantically equivalent to the '=' operator in the index's operator class, so it doesn't think it can use the index. > And yes - I know I probably shouldn't be joining an int to a varchar. Not without being pretty darn careful. You didn't actually say what the datatype of tag_list.id is, however. regards, tom lane
Re: [GENERAL] QUERY PLAN:
On Wed, Oct 27, 1999 at 11:03:45AM +0100, Tim Joyce wrote: > can someone point me at documentation so that i can unserstand the results > of an EXPLAIN, eg: > > Index Scan using words_id_idx on books_idx (cost=441.19 rows=7644 width=8) > > explain select * from books_idx where wrd_id=1; > > I am paticularly interested in what the rows= figure means. I've picked up a little info on this from following the hackers list, so I don't know what docs to point you at. My understanding is that the rows= represents the number of tuples the optimizer estimates will be returned by that step of the execution plan. In the example you've shown, it's an index scan of a field, and the estimate (based on last know number of tuples in the table (as of your last VACUUM ANALYZE) and an estimate of the selectivity of the operator being applied to this index (<, =, >, etc), and the approximate dispersion of the values in that field. For more detail, I'd suggest checking the archives of the pgsql-hackers list, and perhaps then asking on the list itself. There's some hints in the "PostgreSQL Programmer's Guide" in the "Extending SQL" sections on functions, operators, and interfacing them to indices. Ross -- Ross J. Reedstrom, Ph.D., <[EMAIL PROTECTED]> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
[GENERAL] QUERY PLAN:
can someone point me at documentation so that i can unserstand the results of an EXPLAIN, eg: Index Scan using words_id_idx on books_idx (cost=441.19 rows=7644 width=8) explain select * from books_idx where wrd_id=1; I am paticularly interested in what the rows= figure means. TIA timj