On 08/23/2015 11:38 AM, Jing Lu wrote: > Thanks, Andrew! > > Yes, I was thinking about using scikit-learn also. But I guess I need to > use a data structure for sparse matrix and define a function for > connectivity. I hope the memory issue won't be a problem. > Most AgglomerativeClustering algorithms have time complexity with N^2. Will > that be a problem?
Usual programming solutions are - if you don't need the whole matrix in RAM at once, cache it to disk. Otherwise try to split the job into smaller batches. - Big-Oh notation is relative complexity. In absolute terms, if it finishes overnight and you only intend to run it a handful of times, N^2 is not worth worrying about. Otherwise try to split into smaller batches that you can run in parallel on a cluster of computers. FWIW -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
signature.asc
Description: OpenPGP digital signature
------------------------------------------------------------------------------
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss