[SQL] bad plan

2012-04-05 Thread Julien Cigar

Hello,

I have an extremely bad plan for one of my colleague's query. Basically 
PostgreSQL chooses to seq scan instead of index scan. This is on:


antabif=# select version();
 version
--
 PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc (GCC) 
4.2.1 20070719  [FreeBSD], 64-bit


The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15

The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:

- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost 
settings but it's doesn't change anything, except setting 
random_page_cost to 1 (which will lead to bad plans for other queries, 
so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and 
enable_hashjoin to false


I'm currently out of idea why PostgreSQL still chooses a bad plan for 
this query ... any hint :) ?


Thank you,
Julien

--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

<>
-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] bad plan

2012-04-05 Thread Mario Dankoor

Julien,

It looks like you forgot to post the query.

Mario
On 2012-04-05 1:38 PM, Julien Cigar wrote:

Hello,

I have an extremely bad plan for one of my colleague's query. 
Basically PostgreSQL chooses to seq scan instead of index scan. This 
is on:


antabif=# select version();
 version
-- 

 PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc 
(GCC) 4.2.1 20070719  [FreeBSD], 64-bit


The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15

The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:

- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost 
settings but it's doesn't change anything, except setting 
random_page_cost to 1 (which will lead to bad plans for other queries, 
so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and 
enable_hashjoin to false


I'm currently out of idea why PostgreSQL still chooses a bad plan for 
this query ... any hint :) ?


Thank you,
Julien







Re: [SQL] bad plan

2012-04-05 Thread Julien Cigar

Hello Mario,

The query is in the pastie.org links, here is a copy:

SELECT
  t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
  SELECT
geo_id , count(*) AS frequence
  FROM
hit.context_to_context_links
  WHERE
NOT geo_id IS NULL ANDtaxon_id= ANY (
  SELECT
taxon_id
  FROM
rab.ancestors
  WHERE
ancestors.subphylum_id = 18830
  ) GROUP BY geo_id
) as t_geo
JOIN
  hit.geo_candidates gc ON gc.id = t_geo.geo_id;

Thank you,
Julien

On 04/05/2012 14:03, Mario Dankoor wrote:

Julien,

It looks like you forgot to post the query.

Mario
On 2012-04-05 1:38 PM, Julien Cigar wrote:

Hello,

I have an extremely bad plan for one of my colleague's query. 
Basically PostgreSQL chooses to seq scan instead of index scan. This 
is on:


antabif=# select version();
 version
-- 

 PostgreSQL 9.0.7 on amd64-portbld-freebsd8.2, compiled by GCC cc 
(GCC) 4.2.1 20070719  [FreeBSD], 64-bit


The machines has 4GB of RAM with the following config:
- shared_buffers: 512MB
- effective_cache_size: 2GB
- work_mem: 32MB
- maintenance_work_mem: 128MB
- default_statistics_target: 300
- temp_buffers: 64MB
- wal_buffers: 8MB
- checkpoint_segments = 15

The tables have been ANALYZE'd. I've put the EXPLAIN ANALYZE on:

- http://www.pastie.org/3731956 : with default config
- http://www.pastie.org/3731960 : this is with enable_seq_scan = off
- http://www.pastie.org/3731962 : I tried to play on the various cost 
settings but it's doesn't change anything, except setting 
random_page_cost to 1 (which will lead to bad plans for other 
queries, so not a solution)
- http://www.pastie.org/3732035 : with enable_hashagg and 
enable_hashjoin to false


I'm currently out of idea why PostgreSQL still chooses a bad plan for 
this query ... any hint :) ?


Thank you,
Julien








--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

<>
-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] bad plan

2012-04-05 Thread Mario Dankoor

Julien,

I generally try avoid NOT predicates.
If your geo_id is an integer,  try geo_id > -1
or if it's a varchar geo_id > ''.
The idea is that geo_id > 0 is false for null values.

I don't think query optimizers are fond of NOT predicates.

SELECT  t_geo.frequence
   ,ST_SetSRID(gc.geom, 4326) as geom
FROM (
   SELECT  geo_id
  ,count(*) AS frequence
   FROM hit.context_to_context_links
   WHERE geo_id > -1
   AND taxon_id= ANY (
  SELECT taxon_id
  FROM rab.ancestors
  WHERE ancestors.subphylum_id = 18830
) GROUP BY geo_id
 ) as t_geo
JOIN  hit.geo_candidates gc
ON gc.id = t_geo.geo_id;









On 2012-04-05 2:08 PM, Julien Cigar wrote:

SELECT
  t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
  SELECT
geo_id , count(*) AS frequence
  FROM
hit.context_to_context_links
  WHERE
NOT geo_id IS NULL ANDtaxon_id= ANY (
  SELECT
taxon_id
  FROM
rab.ancestors
  WHERE
ancestors.subphylum_id = 18830
  ) GROUP BY geo_id
) as t_geo
JOIN
  hit.geo_candidates gc ON gc.id = t_geo.geo_id; 



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] bad plan

2012-04-05 Thread Julien Cigar

Hi Mario,

Setting it to > -1, or even removing the condition doesn't change 
anything .. the problem is more that the estimate of the join point 
sucks pretty much ..


Julien

On 04/05/2012 14:32, Mario Dankoor wrote:

Julien,

I generally try avoid NOT predicates.
If your geo_id is an integer,  try geo_id > -1
or if it's a varchar geo_id > ''.
The idea is that geo_id > 0 is false for null values.

I don't think query optimizers are fond of NOT predicates.

SELECT  t_geo.frequence
   ,ST_SetSRID(gc.geom, 4326) as geom
FROM (
   SELECT  geo_id
  ,count(*) AS frequence
   FROM hit.context_to_context_links
   WHERE geo_id > -1
   AND taxon_id= ANY (
  SELECT taxon_id
  FROM rab.ancestors
  WHERE ancestors.subphylum_id = 18830
) GROUP BY geo_id
 ) as t_geo
JOIN  hit.geo_candidates gc
ON gc.id = t_geo.geo_id;









On 2012-04-05 2:08 PM, Julien Cigar wrote:

SELECT
  t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
  SELECT
geo_id , count(*) AS frequence
  FROM
hit.context_to_context_links
  WHERE
NOT geo_id IS NULL ANDtaxon_id= ANY (
  SELECT
taxon_id
  FROM
rab.ancestors
  WHERE
ancestors.subphylum_id = 18830
  ) GROUP BY geo_id
) as t_geo
JOIN
  hit.geo_candidates gc ON gc.id = t_geo.geo_id; 






--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

<>
-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] bad plan

2012-04-05 Thread Julien Cigar

FYI, raising cpu_tuple_cost from 0.01 to 0.5 fixed the problem..!

On 04/05/2012 14:43, Julien Cigar wrote:

Hi Mario,

Setting it to > -1, or even removing the condition doesn't change 
anything .. the problem is more that the estimate of the join point 
sucks pretty much ..


Julien

On 04/05/2012 14:32, Mario Dankoor wrote:

Julien,

I generally try avoid NOT predicates.
If your geo_id is an integer,  try geo_id > -1
or if it's a varchar geo_id > ''.
The idea is that geo_id > 0 is false for null values.

I don't think query optimizers are fond of NOT predicates.

SELECT  t_geo.frequence
   ,ST_SetSRID(gc.geom, 4326) as geom
FROM (
   SELECT  geo_id
  ,count(*) AS frequence
   FROM hit.context_to_context_links
   WHERE geo_id > -1
   AND taxon_id= ANY (
  SELECT taxon_id
  FROM rab.ancestors
  WHERE ancestors.subphylum_id = 18830
) GROUP BY geo_id
 ) as t_geo
JOIN  hit.geo_candidates gc
ON gc.id = t_geo.geo_id;









On 2012-04-05 2:08 PM, Julien Cigar wrote:

SELECT
  t_geo.frequence, ST_SetSRID(gc.geom, 4326) as geom
FROM (
  SELECT
geo_id , count(*) AS frequence
  FROM
hit.context_to_context_links
  WHERE
NOT geo_id IS NULL ANDtaxon_id= ANY (
  SELECT
taxon_id
  FROM
rab.ancestors
  WHERE
ancestors.subphylum_id = 18830
  ) GROUP BY geo_id
) as t_geo
JOIN
  hit.geo_candidates gc ON gc.id = t_geo.geo_id; 











--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

<>
-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql