Dear all,

I'm really excited to announce the most recent contribution from
Novartis to the RDKit: a cartridge for similarity searching in
PostgeSQL. The cartridge supports Tanimoto and Dice similarity for
both bitmap and count-vector fingerprints and currently supports
RDKit, layered, morgan (ECFP-like), atom-pair, and topological torsion
fingerprinters. Adding a new fingerprinter (either using the RDKit or
some other C/C++ library) is very straightforward. We make use of the
PostgreSQL index system, so searches are fast. Our benchmarking work
was done with a database of about 4 million drug-like compounds.
Searching this database using drug-like molecules as queries and
binary fingerprints typically takes around 4 seconds on a single CPU
(these queries are returning 10s-1000s of results). I have no doubt
that this can be made faster by tuning database parameters (the
results here are for the standard PostgreSQL settings), but we're
pretty happy with things already.

The cartridge also allows substructure searching (using SMILES), but
that's not as highly optimized as the similarity searching. We'll
continue to work on this and will add SMARTS support in the future.

The cartridge builds on linux and the mac. I don't plan to support it
on Windows, but if someone else can get it working, I'd be happy to
have someone else do so.

As of this morning, the code is in svn:
http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/
There's a README that gives some build/install instructions (it's
pretty easy), but I'll be adding additional information on the wiki on
pages linked from here:
http://code.google.com/p/rdkit/wiki/PostgresCartridge

Nik Stiefl will be presenting a poster at the upcoming Sheffield
conference that describes some of our work with and on open-source
projects and that includes some details about the cartridge. If you
happen to be in Sheffield, please swing by Nik's poster and check it
out.

This wouldn't have happened without the ideas and support provided by
my colleague Andy Palmer. There are a lot of other people to
acknowledge -- this isn't a project I could have done myself -- but
rather than forgetting anyone I will hold off on trying to get the
names down until the real release announcement is done.

I will be writing more about this in the near future, and we'll
probably be writing a paper about the whole story, but I wanted to
give the community an initial heads up that the code is there.

Best Regards,
-greg

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to