I was puzzled as to why my search slowed down when I added columns. The VACUUM did not restore the former speed, which I had obtained before adding the columns. So, I rebuilt the table with only the smiles column and my original speed was again obtained (not surprising). After I added the extra columns, it slowed down again. Finally, I built the table with all the additional columns created during the initial creation of the table. The original speed was obtained! I conclude that the addition of columns after building all the rows of a table somehow makes the table access less efficient. Is this generally true? Is there a more efficient way to add columns to a table after its initial construction?
The secondary issue was one of using an index on the additional columns. This greatly speeds up the overall search, by limiting the number of rows needing to use oe_matches. I am currently working on optimizing the number and nature of these extra columns. However, my initial question still remains. Once I find a good set of columns to use as an index, will I then get even greater speed by defining a new data type and an index method equivalent to my multi-column index?
Here are the data you requested. I think this is less important now that I know I should create all my columns from the beginning. Thanks for the tip on how to compute average time spent in my oe_matches functions. This will be very useful for future optimization.
SELECT count(*) FROM structure 237597
SELECT avg(length(smiles)) FROM structure 37.6528912402092619
VACUUM FULL ANALYZE structure (no output)
EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM structure Seq Scan on structure (cost=0.00..7573.96 rows=237597 width=41) (actual time=17.443..15025.974 rows=237597 loops=1) Total runtime: 16786.542 ms
EXPLAIN ANALYZE SELECT smiles FROM structure Seq Scan on structure (cost=0.00..6979.97 rows=237597 width=41) (actual time=0.067..735.884 rows=237597 loops=1) Total runtime: 1200.661 ms
TJ
PFC wrote:
I'm quite happy with the speedup in 3, but puzzled over the slowdown in 2.
Could you provide :
- SELECT count(*) FROM structure; => NRows - SELECT avg(length(smiles)) FROM structure;
Then VACUUM FULL ANALYZE structure Redo your timings and this time post EXPLAIN ANALYZE
Also your query returns 1313 rows, so wan you post :
EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM structure;
=> time T1
EXPLAIN ANALYZE SELECT smiles FROM structure;
=> time T2
(T1-T2)/(NRows) will give you an estimate of the time spent in each oe_matches call.
Also note that for postgres (a,b) > (c,d) means ((a>c) and (b>d)), which can be misleading, but I think that's what you wanted.
---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster