[Rdkit-discuss] Clustering

Chris Swain Sun, 04 Jun 2017 00:09:06 -0700

Hi,

I want to do clustering on around 4 million structures


The Rdkit cookbook (http://www.rdkit.org/docs/Cookbook.html 
<http://www.rdkit.org/docs/Cookbook.html>) suggests 

"For large sets of molecules (more than 1000-2000), it’s most efficient to use 
the Butina clustering algorithm”

 However it is quite a step up from a few thousand to several million and I 
wondered if anyone had used this algorithm on larger data sets?

As far as I can tell it is not possible to define the number of clusters, is 
this correct?

Cheers,

Chris

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

[Rdkit-discuss] Clustering

Reply via email to