On Fri, Feb 24, 2012 at 5:44 AM, Greg Landrum <[email protected]> wrote:
>
> But then I realized from that last result that the fingerprint index
> isn't actually being used for queries with qmols; it's always doing a
> sequential scan. As soon as I get done scratching my head I will put
> this in the bug tracker.
>
> Still, even if the index were being used I would not expect the
> performance for SMARTS-based queries to be as good as that for
> SMILES-based queries; the fingerprint just is not going to be as
> effective.
Ok, I have this resolved in svn in some form.
In order to make it work in a reasonable manner, I had to add a new
function "mol_from_smarts()" that gives you a normal mol from a SMARTS
string.
Here are some example queries.
Start with SMILES:
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('O=c1ccc2ccccc2o1');
count
-------
9231
(1 row)
Time: 14156.885 ms
The same query as SMARTS takes the same amount of time:
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('O=c1ccc2ccccc2o1');
count
-------
9231
(1 row)
Time: 13945.446 ms
Adding a query feature slows things down somewhere between dramatically:
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('[O,N,C]=c1ccc2ccccc2o1');
count
-------
9462
(1 row)
Time: 36799.979 ms
and very dramatically:
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('O=c1ccc2ccccc2[o,n]1');
count
-------
12441
(1 row)
Time: 53893.576 ms
Note that the degradation of performance caused by query features is
going to seriously affect your queries as well.
For the example you give in your original post, which is a single
atom, the index doesn't help at all:
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('[!#1;!#6;!#7;!#8;!#9;!#16;!Cl;!Br;!I]');
count
-------
31656
(1 row)
Time: 57572.277 ms
chembl_12=# set enable_indexscan=off;
SET
Time: 0.173 ms
chembl_12=# set enable_bitmapscan=off;
SET
Time: 0.153 ms
chembl_12=# select count(*) from rdk.mols where
m@>mol_from_smarts('[!#1;!#6;!#7;!#8;!#9;!#16;!Cl;!Br;!I]');
count
-------
31656
(1 row)
Time: 55173.435 ms
Best,
-greg
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss