Hi Greg and Thorsten,

> Greg:
>
>> Thorsten:
>> On the other hand, 4000 rows should not take that long in KNIME. How
>> much times does it currently take?
>
> I just did 1000 rows on my macbook. Assuming I'm reading the knime log
> correctly, that took about a minute.


Thanks for testing this out, Greg.  I must confess, I didn't wait for
the hierarchical clustering to finish for the 4000!  Going back and
selecting a random 1000 molecule subset, I reproduce your result of ~ 1
min (I get 67 secs).  If I then go to 2000, it takes 520 secs - so to me
this looks like cubic complexity - which is what the documentation for
the node states (this would mean > 1 hr for my original 4000...)

For completeness - this result was with the Hierarchical
Clustering(DistMatrix) node set with 'Tanimoto' similarity and 'Complete
Linkage' for cluster comparison.  Changing the comparison to 'Single
Linkage' did not reduce the time.

Interestingly, the documentation for the 'standard' Hierarchical
Clustering' (ie non-distance matrix) node states that it operates with
"n-squared complexity".  I guess other clustering algorithms available
in knime must scale better than cubicly as well (k-means, fuzzy
c-means?) - but as far as I can see they don't currently operate on
distance matrices (or directly on bit vectors).  If they could, then
this may be a solution; or implementing the Murtagh algorithm (I am
guessing the scaling is below cubic from my recollection of the speeds
observed in rdkit).

Kind regards

James

______________________________________________________________________
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
______________________________________________________________________

------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to