Hi Greg and Thorsten,
> Greg: > >> Thorsten: >> On the other hand, 4000 rows should not take that long in KNIME. How >> much times does it currently take? > > I just did 1000 rows on my macbook. Assuming I'm reading the knime log > correctly, that took about a minute. Thanks for testing this out, Greg. I must confess, I didn't wait for the hierarchical clustering to finish for the 4000! Going back and selecting a random 1000 molecule subset, I reproduce your result of ~ 1 min (I get 67 secs). If I then go to 2000, it takes 520 secs - so to me this looks like cubic complexity - which is what the documentation for the node states (this would mean > 1 hr for my original 4000...) For completeness - this result was with the Hierarchical Clustering(DistMatrix) node set with 'Tanimoto' similarity and 'Complete Linkage' for cluster comparison. Changing the comparison to 'Single Linkage' did not reduce the time. Interestingly, the documentation for the 'standard' Hierarchical Clustering' (ie non-distance matrix) node states that it operates with "n-squared complexity". I guess other clustering algorithms available in knime must scale better than cubicly as well (k-means, fuzzy c-means?) - but as far as I can see they don't currently operate on distance matrices (or directly on bit vectors). If they could, then this may be a solution; or implementing the Murtagh algorithm (I am guessing the scaling is below cubic from my recollection of the speeds observed in rdkit). Kind regards James ______________________________________________________________________ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. ______________________________________________________________________ ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss