Hi, how about scaffold based clustering . You extract the scaffolds and then cluster it and then put the respective scaffold compounds inside the cluster .
Sent from my iPhone > On Aug 22, 2015, at 8:43 PM, Jing Lu <[email protected]> wrote: > > Dear RDKit users, > > If I want to cluster more than 1M molecules by ECFP4. How could I do it? If I > calculate the distance between every pair of molecules, the size of distance > matrix will be too big. Does RDKit support any heuristic clustering algorithm > without calculating the distance matrix of the whole library? > > > > Thanks, > Jing > ------------------------------------------------------------------------------ > _______________________________________________ > Rdkit-discuss mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ------------------------------------------------------------------------------ _______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

