I'm working with a third-party plugin that does chemistry. It's very fast.
However, I'm trying to do a sampling query, such as the first 1% of the
database, and I just can't get the planner to create a good plan.  Here is
the full query (the |>| operator does a subgraph match of a molecular
substructure, in this case benzene, to find all molecules that have a
benzene ring in the database):

explain analyze select * from version where smiles |>| 'c1ccccc1';
 ...
 Index Scan using i_version_smiles on version  (cost=3445.75..147094.03
rows=180283 width=36) (actual time=336.493..10015.753
 rows=180973 loops=1)
   Index Cond: (smiles |>| 'c1ccccc1'::molecule)
 Planning time: 1.228 ms
 Execution time: 10371.903 ms


Ten seconds over 263,000 molecules, which is actually good. Now let's limit
it to the first 1% of the rows:

explain analyze select * from version where smiles |>| 'c1ccccc1' and
version_id < 897630;
...
 Index Scan using pk_version on version  (cost=0.42..131940.05 rows=1643
width=36) (actual time=6.122..2816.298 rows=2039 loops=1)
   Index Cond: (version_id < 897630)
   Filter: (smiles |>| 'c1ccccc1'::molecule)
   Rows Removed by Filter: 590
 Planning time: 1.217 ms
 Execution time: 2822.117 ms


Notice that it doesn't use the i_version_smiles index at all, but instead
applies the very expensive filter |>| to all 1% of the database. So instead
of getting a 100x speedup, we only get a 3x speedup, about 30x worse that
what is theoretically possible.

The production database is about 50x larger than this test database.

Maybe I misunderstand what's possible with indexes, but it seems to me that
it could first do the pk_version index scan, and then use the results of
that to do a limited index-scan search using the i_version_smiles index. Is
that not possible? Is each index scan "self contained", that is, it doesn't
take into account the results of another index scan?

Thanks,
Craig

Reply via email to