Re: [PERFORM] large tables and simple "= constant" queries using indexes

Arjen van der Meijden Thu, 10 Apr 2008 00:23:11 -0700

Hi John,

You don't use the same 'gene_ref ='-value, so its not a perfectcomparison. And obviously, there is the fact that the data can be in thedisk cache, the second time you run it, which would explain the almostinstantaneous result for the second query.

If repeating the query a few times with 200 still makes it do its workin 15 seconds and with 800 in less than 100ms, than you might have founda bug, or it is at least something I don't know how to fix.

I doubt upping the default for all tables to 1000 is a good idea. Thedata collected is used in the query-planning-stage, where more datameans more processing time. Obviously there is a tradeoff somewherebetween having more statistics and thus being able to plan the querybetter versus requiring more time to process those statistics.


Best regards,

Arjen

On 10-4-2008 0:24 John Beaver wrote:

Perfect - thanks Arjen. Using your value of 200 decreased the time to 15seconds, and using a value of 800 makes it almost instantaneous. I'mreally not concerned about space usage; if having more statisticsincreases performance this much, maybe I'll just default it to 1000?
Strangely, the steps taken in the explain analyze are all the same. Theonly differences are the predicted costs (and execution times).
explain analyze for a statistics of 200:
"Aggregate (cost=8831.27..8831.28 rows=1 width=0) (actualtime=15198.407..15198.408 rows=1 loops=1)"" -> Bitmap Heap Scan on gene_prediction_view (cost=44.16..8825.29rows=2392 width=0) (actual time=19.719..15191.875 rows=2455 loops=1)"
"        Recheck Cond: (gene_ref = 500)"
" -> Bitmap Index Scan on ix_gene_prediction_view_gene_ref(cost=0.00..43.56 rows=2392 width=0) (actual time=18.871..18.871rows=2455 loops=1)"
"              Index Cond: (gene_ref = 500)"
"Total runtime: 15198.651 ms"

explain analyze for a statistics of 800:
"Aggregate (cost=8873.75..8873.76 rows=1 width=0) (actualtime=94.473..94.473 rows=1 loops=1)"" -> Bitmap Heap Scan on gene_prediction_view (cost=44.25..8867.74rows=2404 width=0) (actual time=39.358..93.733 rows=2455 loops=1)"
"        Recheck Cond: (gene_ref = 301)"
" -> Bitmap Index Scan on ix_gene_prediction_view_gene_ref(cost=0.00..43.65 rows=2404 width=0) (actual time=38.472..38.472rows=2455 loops=1)"
"              Index Cond: (gene_ref = 301)"
"Total runtime: 94.622 ms"




Arjen van der Meijden wrote:
First of all, there is the 'explain analyze' output, which is prettyhelpful in postgresql.
My guess is, postgresql decides to do a table scan for some reason. Itmight not have enough statistics for this particular table or column,to make a sound decision. What you can try is to increase thestatistics target, which works pretty easy:
ALTER TABLE gene_prediction_view ALTER gene_ref SET STATISTICS 200;
Valid ranges are from 1(0?) - 1000, the default is 10, the default onmy systems is usually 100. For such a large table, I'd go with 200.
After that, you'll need to re-analyze your table and you can try again.
Perhaps analyze should try to establish its own best guess to how manysamples it should take? The default of 10 is rather limited for largetables.
Best regards,

Arjen

On 9-4-2008 22:58 John Beaver wrote:
Hi, I've started my first project with Postgres (after several yearsof using Mysql), and I'm having an odd performance problem that I washoping someone might be able to explain the cause of.
----My query----
   - select count(*) from gene_prediction_view where gene_ref = 523
- takes 26 seconds to execute, and returns 2400 (out of a total of15 million records in the table)
---My problem---
Using a single-column index to count 2400 records which areexactly one constant value doesn't sound like something that wouldtake 26 seconds. What's the slowdown? Any silver bullets that mightfix this?
----Steps I've taken----
   - I ran vacuum and analyze
- I upped the shared_buffers to 58384, and I upped some of theother postgresql.conf values as well. Nothing seemed to helpsignificantly, but maybe I missed something that would helpspecifically for this query type?- I tried to create a hash index, but gave up after more than 4hours of waiting for it to finish indexing
----Table stats----
- 15 million rows; I'm expecting to have four or five times thisnumber eventually.
   - 1.5 gigs of hard drive usage

----My development environment---
- 2.6ghz dual-core MacBook Pro with 4 gigs of ram and a 7200 rpmhard drive
   - OS X 10.5.2
   - Postgres 8.3 (installed via MacPorts)

----My table----

CREATE TABLE gene_prediction_view
(
 id serial NOT NULL,
 gene_ref integer NOT NULL,
 go_id integer NOT NULL,
 go_description character varying(200) NOT NULL,
 go_category character varying(50) NOT NULL,
 function_verified_exactly boolean NOT NULL,
 function_verified_with_parent_go boolean NOT NULL,
 function_verified_with_child_go boolean NOT NULL,
 score numeric(10,2) NOT NULL,
 precision_score numeric(10,2) NOT NULL,
 CONSTRAINT gene_prediction_view_pkey PRIMARY KEY (id),
 CONSTRAINT gene_prediction_view_gene_ref_fkey FOREIGN KEY (gene_ref)
     REFERENCES sgd_annotations (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT gene_prediction_view_go_id_fkey FOREIGN KEY (go_id)
     REFERENCES go_terms (term) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT gene_prediction_view_gene_ref_key UNIQUE (gene_ref, go_id)
)
WITH (OIDS=FALSE);
ALTER TABLE gene_prediction_view OWNER TO postgres;

CREATE INDEX ix_gene_prediction_view_gene_ref
 ON gene_prediction_view
 USING btree
 (gene_ref);


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] large tables and simple "= constant" queries using indexes

Reply via email to