On 08/23/2015 11:38 AM, Jing Lu wrote:
> Thanks, Andrew!
> 
> Yes, I was thinking about using scikit-learn also. But I guess I need to
> use a data structure for sparse matrix and define a function for
> connectivity. I hope the memory issue won't be a problem.
> Most AgglomerativeClustering algorithms have time complexity with N^2. Will
> that be a problem?

Usual programming solutions are
- if you don't need the whole matrix in RAM at once, cache it to disk.
Otherwise try to split the job into smaller batches.
- Big-Oh notation is relative complexity. In absolute terms, if it
finishes overnight and you only intend to run it a handful of times, N^2
is not worth worrying about. Otherwise try to split into smaller batches
that you can run in parallel on a cluster of computers.

FWIW
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to