Hi,
I want to do clustering on around 4 million structures
The Rdkit cookbook (http://www.rdkit.org/docs/Cookbook.html
<http://www.rdkit.org/docs/Cookbook.html>) suggests
"For large sets of molecules (more than 1000-2000), it’s most efficient to use
the Butina clustering algorithm”
However it is quite a step up from a few thousand to several million and I
wondered if anyone had used this algorithm on larger data sets?
As far as I can tell it is not possible to define the number of clusters, is
this correct?
Cheers,
Chris
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss