[SQL] bad plan
Hello, I have an extremely bad plan for one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: antabif=# select version(); version -- PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit The machines has 4GB of RAM with the following config: - shared_buffers: 512MB - effective_cache_size: 2GB - work_mem: 32MB - maintenance_work_mem: 128MB - default_statistics_target: 300 - temp_buffers: 64MB - wal_buffers: 8MB - checkpoint_segments = 15 The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off - http://www.pastie.org/3731962 : I tried to play on the various cost settings but it's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, so not a solution) - http://www.pastie.org/3732035 : with enable_hashagg and enable_hashjoin to false I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? Thank you, Julien -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. <> -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bad plan
Julien, It looks like you forgot to post the query. Mario On 2012-04-05 1:38 PM, Julien Cigar wrote: Hello, I have an extremely bad plan for one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: antabif=# select version(); version -- PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit The machines has 4GB of RAM with the following config: - shared_buffers: 512MB - effective_cache_size: 2GB - work_mem: 32MB - maintenance_work_mem: 128MB - default_statistics_target: 300 - temp_buffers: 64MB - wal_buffers: 8MB - checkpoint_segments = 15 The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off - http://www.pastie.org/3731962 : I tried to play on the various cost settings but it's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, so not a solution) - http://www.pastie.org/3732035 : with enable_hashagg and enable_hashjoin to false I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? Thank you, Julien
Re: [SQL] bad plan
Hello Mario, The query is in the pastie.org links, here is a copy: SELECT t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id , count(*) AS frequence FROM hit.context_to_context_links WHERE NOT geo_id IS NULL ANDtaxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; Thank you, Julien On 04/05/2012 14:03, Mario Dankoor wrote: Julien, It looks like you forgot to post the query. Mario On 2012-04-05 1:38 PM, Julien Cigar wrote: Hello, I have an extremely bad plan for one of my colleague's query. Basically PostgreSQL chooses to seq scan instead of index scan. This is on: antabif=# select version(); version -- PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit The machines has 4GB of RAM with the following config: - shared_buffers: 512MB - effective_cache_size: 2GB - work_mem: 32MB - maintenance_work_mem: 128MB - default_statistics_target: 300 - temp_buffers: 64MB - wal_buffers: 8MB - checkpoint_segments = 15 The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on: - http://www.pastie.org/3731956 : with default config - http://www.pastie.org/3731960 : this is with enable_seq_scan = off - http://www.pastie.org/3731962 : I tried to play on the various cost settings but it's doesn't change anything, except setting random_page_cost to 1 (which will lead to bad plans for other queries, so not a solution) - http://www.pastie.org/3732035 : with enable_hashagg and enable_hashjoin to false I'm currently out of idea why PostgreSQL still chooses a bad plan for this query ... any hint :) ? Thank you, Julien -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. <> -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bad plan
Julien, I generally try avoid NOT predicates. If your geo_id is an integer, try geo_id > -1 or if it's a varchar geo_id > ''. The idea is that geo_id > 0 is false for null values. I don't think query optimizers are fond of NOT predicates. SELECT t_geo.frequence ,ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id ,count(*) AS frequence FROM hit.context_to_context_links WHERE geo_id > -1 AND taxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; On 2012-04-05 2:08 PM, Julien Cigar wrote: SELECT t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id , count(*) AS frequence FROM hit.context_to_context_links WHERE NOT geo_id IS NULL ANDtaxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bad plan
Hi Mario, Setting it to > -1, or even removing the condition doesn't change anything .. the problem is more that the estimate of the join point sucks pretty much .. Julien On 04/05/2012 14:32, Mario Dankoor wrote: Julien, I generally try avoid NOT predicates. If your geo_id is an integer, try geo_id > -1 or if it's a varchar geo_id > ''. The idea is that geo_id > 0 is false for null values. I don't think query optimizers are fond of NOT predicates. SELECT t_geo.frequence ,ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id ,count(*) AS frequence FROM hit.context_to_context_links WHERE geo_id > -1 AND taxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; On 2012-04-05 2:08 PM, Julien Cigar wrote: SELECT t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id , count(*) AS frequence FROM hit.context_to_context_links WHERE NOT geo_id IS NULL ANDtaxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. <> -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bad plan
FYI, raising cpu_tuple_cost from 0.01 to 0.5 fixed the problem..! On 04/05/2012 14:43, Julien Cigar wrote: Hi Mario, Setting it to > -1, or even removing the condition doesn't change anything .. the problem is more that the estimate of the join point sucks pretty much .. Julien On 04/05/2012 14:32, Mario Dankoor wrote: Julien, I generally try avoid NOT predicates. If your geo_id is an integer, try geo_id > -1 or if it's a varchar geo_id > ''. The idea is that geo_id > 0 is false for null values. I don't think query optimizers are fond of NOT predicates. SELECT t_geo.frequence ,ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id ,count(*) AS frequence FROM hit.context_to_context_links WHERE geo_id > -1 AND taxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; On 2012-04-05 2:08 PM, Julien Cigar wrote: SELECT t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom FROM ( SELECT geo_id , count(*) AS frequence FROM hit.context_to_context_links WHERE NOT geo_id IS NULL ANDtaxon_id= ANY ( SELECT taxon_id FROM rab.ancestors WHERE ancestors.subphylum_id = 18830 ) GROUP BY geo_id ) as t_geo JOIN hit.geo_candidates gc ON gc.id = t_geo.geo_id; -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. <> -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql