Dear all, I'm really excited to announce the most recent contribution from Novartis to the RDKit: a cartridge for similarity searching in PostgeSQL. The cartridge supports Tanimoto and Dice similarity for both bitmap and count-vector fingerprints and currently supports RDKit, layered, morgan (ECFP-like), atom-pair, and topological torsion fingerprinters. Adding a new fingerprinter (either using the RDKit or some other C/C++ library) is very straightforward. We make use of the PostgreSQL index system, so searches are fast. Our benchmarking work was done with a database of about 4 million drug-like compounds. Searching this database using drug-like molecules as queries and binary fingerprints typically takes around 4 seconds on a single CPU (these queries are returning 10s-1000s of results). I have no doubt that this can be made faster by tuning database parameters (the results here are for the standard PostgreSQL settings), but we're pretty happy with things already.
The cartridge also allows substructure searching (using SMILES), but that's not as highly optimized as the similarity searching. We'll continue to work on this and will add SMARTS support in the future. The cartridge builds on linux and the mac. I don't plan to support it on Windows, but if someone else can get it working, I'd be happy to have someone else do so. As of this morning, the code is in svn: http://rdkit.svn.sourceforge.net/viewvc/rdkit/trunk/Code/PgSQL/rdkit/ There's a README that gives some build/install instructions (it's pretty easy), but I'll be adding additional information on the wiki on pages linked from here: http://code.google.com/p/rdkit/wiki/PostgresCartridge Nik Stiefl will be presenting a poster at the upcoming Sheffield conference that describes some of our work with and on open-source projects and that includes some details about the cartridge. If you happen to be in Sheffield, please swing by Nik's poster and check it out. This wouldn't have happened without the ideas and support provided by my colleague Andy Palmer. There are a lot of other people to acknowledge -- this isn't a project I could have done myself -- but rather than forgetting anyone I will hold off on trying to get the names down until the real release announcement is done. I will be writing more about this in the near future, and we'll probably be writing a paper about the whole story, but I wanted to give the community an initial heads up that the code is there. Best Regards, -greg ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss