Hi,

I want to do clustering on around 4 million structures

The Rdkit cookbook (http://www.rdkit.org/docs/Cookbook.html 
<http://www.rdkit.org/docs/Cookbook.html>) suggests 

"For large sets of molecules (more than 1000-2000), it’s most efficient to use 
the Butina clustering algorithm”

 However it is quite a step up from a few thousand to several million and I 
wondered if anyone had used this algorithm on larger data sets?

As far as I can tell it is not possible to define the number of clusters, is 
this correct?

Cheers,

Chris
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to