Re: [PERFORM] large tables and simple "= constant" queries using indexes

Erik Jones Thu, 10 Apr 2008 08:04:45 -0700


On Apr 10, 2008, at 9:44 AM, John Beaver wrote:

Thanks a lot, all of you - this is excellent advice. With the dataclustered and statistics at a more reasonable value of 100, it nowreproducibly takes even less time - 20-57 ms per query.
After reading the section on "Statistics Used By the Planner" in themanual, I was a little concerned that, while the statistics sped upthe queries that I tried immeasurably, that the most_common_valsarray was where the speedup was happening, and that the values whichwouldn't fit in this array wouldn't be sped up. Though I couldn'toffhand find an example where this occurred, the clustering approachseems intuitively like a much more complete and scalable solution,at least for a read-only table like this.
As to whether the entire index/table was getting into ram between mystatistics calls, I don't think this was the case. Here's thebehavior that I found:- With statistics at 10, the query took 25 (or so) seconds no matterhow many times I tried different values. The query plan was the sameas for the 200 and 800 statistics below.- Trying the same constant a second time gave an instantaneousresult, I'm guessing because of query/result caching.- Immediately on increasing the statistics to 200, the query took areproducibly less amount of time. I tried about 10 different values- Immediately on increasing the statistics to 800, the queryreproducibly took less than a second every time. I tried about 30different values.- Decreasing the statistics to 100 and running the cluster commandbrought it to 57 ms per query.- The Activity Monitor (OSX) lists the relevant postgres process astaking a little less than 500 megs.- I didn't try decreasing the statistics back to 10 before I ran thecluster command, so I can't show the search times going up becauseof that. But I tried killing the 500 meg process. The new processuses less than 5 megs of ram, and still reproducibly returns aresult in less than 60 ms. Again, this is with a statistics value of100 and the data clustered by gene_prediction_view_gene_ref_key.
And I'll consider the idea of using triggers with an ancillary tablefor other purposes; seems like it could be a useful solution forsomething.

FWIW, killing the backend process responsible for the query won'tnecessarily clear the table's data from memory as that will be in theshared_buffers. If you really want to flush the data from memory youneed to read in data from other tables of a size total size greaterthan your shared_buffers setting.


Erik Jones

DBA | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com




--
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] large tables and simple "= constant" queries using indexes

Reply via email to