[Rdkit-discuss] Parallelize Butina clustering

Thomas Blaschke Wed, 20 Jun 2018 08:27:57 -0700

Hi RDKitlers,

Recently, I had to cluster a large number of compounds and I took a shot atthe Butina clustering. I modified the rdkits included Butina clustering toutilize the Bulk*Similarity functions and to use multiple python processesto calculate the distance matrix more quickly. Although these are very naiveimprovements, I'm able to read in ChEMBL, generate the Morgan fingerprintsand cluster 1.3M compounds within 2 hours using 32 cores and 8G of RAM. Ithought these modification might come in handy for other people doing thesame kind of clustering directly within rdkit and I would like to share thecode. Would be the rdkit contrib folder the right place for these kind ofwork?


Cheers,
Thomas

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Parallelize Butina clustering

Reply via email to