Hi Thomas,

This does indeed sound like something that would be useful for the contrib
directory.

If you have changes to the Butina code itself in order to make it faster
(or to run in parallel), those could possibly be merged into the core RDKit.

As an aside, Roger's presentation at the last RDKit UGM:
https://github.com/rdkit/UGM_2017/blob/master/Presentations/Sayle_RDKitDiversity_Berlin17.pdf
includes a hint that the new MaxMin diversity picking code he contributed
to the RDKit can be used for Butina clustering. This would be a more
involved project (since it's C++ work), but following that route would
almost certainly be the fastest way to speed up the clustering process
itself.

Best,
-greg


On Wed, Jun 20, 2018 at 5:27 PM Thomas Blaschke <thomas.blasc...@uni-bonn.de>
wrote:

> Hi RDKitlers,
>
> Recently, I had to cluster a large number of compounds and I took a shot
> at
> the Butina clustering. I modified the rdkits included Butina clustering to
> utilize the Bulk*Similarity functions and to use multiple python processes
> to calculate the distance matrix more quickly. Although these are very
> naive
> improvements, I'm able to read in ChEMBL, generate the Morgan fingerprints
> and cluster 1.3M compounds within 2 hours using 32 cores and 8G of RAM. I
> thought these modification might come in handy for other people doing the
> same kind of clustering directly within rdkit and I would like to share
> the
> code. Would be the rdkit contrib folder the right place for these kind of
> work?
>
> Cheers,
> Thomas
>
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to