Re: [Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-26 Thread James Davidson

Hi Greg and Thorsten,


 Greg:

 Thorsten:
 On the other hand, 4000 rows should not take that long in KNIME. How
 much times does it currently take?

 I just did 1000 rows on my macbook. Assuming I'm reading the knime log
 correctly, that took about a minute.


Thanks for testing this out, Greg.  I must confess, I didn't wait for
the hierarchical clustering to finish for the 4000!  Going back and
selecting a random 1000 molecule subset, I reproduce your result of ~ 1
min (I get 67 secs).  If I then go to 2000, it takes 520 secs - so to me
this looks like cubic complexity - which is what the documentation for
the node states (this would mean  1 hr for my original 4000...)

For completeness - this result was with the Hierarchical
Clustering(DistMatrix) node set with 'Tanimoto' similarity and 'Complete
Linkage' for cluster comparison.  Changing the comparison to 'Single
Linkage' did not reduce the time.

Interestingly, the documentation for the 'standard' Hierarchical
Clustering' (ie non-distance matrix) node states that it operates with
n-squared complexity.  I guess other clustering algorithms available
in knime must scale better than cubicly as well (k-means, fuzzy
c-means?) - but as far as I can see they don't currently operate on
distance matrices (or directly on bit vectors).  If they could, then
this may be a solution; or implementing the Murtagh algorithm (I am
guessing the scaling is below cubic from my recollection of the speeds
observed in rdkit).

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-24 Thread James Davidson
Dear Greg (and, of course, Thorsten and Bernd!)
 
Great job on the Knime nodes!  I have been giving these a go and am
impressed (and excited about the future development!).  A couple of
observations / comments / questions:
 
1.  I have observed that sometimes the FP node seems to generate blank
fingerprints (doesn't appear to just be the rendering - eg blank if I
swap to 'Bit Scratch' render as well.  I have mainly been trying the
default Morgan FPs, and find that if I reset the node and re-run, the FP
is still blank.  If, however, I swap the node to eg atompair, run, then
swap back to Morgan - it seems to work...  I am running on knime 2.2.2
on Windows 32-bit.
 
2.  The next point is probably down to cheminformatics / knime naivety,
but I must confess I am struggling a little to cluster compounds based
on the FP...   I have used the 'Distance Matrix Calculate' node (with
Tanimoto similarity) to get a matrix that can be used by the
'Heirarchical Clustering (DistMatrix)' or 'k-Medoids' nodes.  However,
both of these appear to perform VERY slowly for a set of ~ 4000
compounds.  I also attempted to cluster on the fingerprints directly,
using the Neighborgrams nodes - but must confess I am some way off
understanding what I am doing!  My limited experience of using the RDKit
functionality to cluster compounds and eg select a representative set
(based on the FP Tanimoto distances and the Murtagh clustering) was that
it performed rather rapidly.  Is there the intention to expose this
functionality in knime (or is the functionality already there and I just
don't know how?)
 
3.  Any plans for Windows 64-bit support?
 
4.  I would be interested to know what the team views as the next
priorities - property calcs, 3D conformations, pharmacophores,
rendering?  So much great stuff to choose from!  :-)
 
Kind regards
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the Company address and 
registration details link at the bottom of the page..
__--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-24 Thread Thorsten Meinl
 reat job on the Knime nodes!? I have been giving these a go and am
  My limited
  experience of using the RDKit functionality to cluster compounds and eg
  select a representative set (based on the FP Tanimoto distances and the
  Murtagh clustering) was that it performed rather rapidly.? Is there the
  intention to expose this functionality in knime (or is the functionality
  already there and I just don't know how?)
 It's not there yet, but it sure would be useful if the knime
 implementation were faster. I don't think it makes sense to use the
 RDKit implementation directly, but it may be possible to do a port of
 the Murtagh algorithm to java.  Thorsten? What do you think?
I have to confess that I have never heard of the Murtaugh algorithm but
it should be possible to port it to Java.
On the other hand, 4000 rows should not take that long in KNIME. How
much times does it currently take?

Cheers,

Thorsten

-- 
Dr.-Ing. Thorsten Meinl   room: Z815
Nycomed Chair for Bioinformatics  fax: +49 (0)7531 88-5132
and Information Miningphone: +49 (0)7531 88-5016
Box 712, 78457 Konstanz, Germany

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-24 Thread Greg Landrum
Hi Thorsten,

On Wed, Nov 24, 2010 at 9:41 PM, Thorsten Meinl
thorsten.me...@uni-konstanz.de wrote:
 reat job on the Knime nodes!? I have been giving these a go and am
  My limited
  experience of using the RDKit functionality to cluster compounds and eg
  select a representative set (based on the FP Tanimoto distances and the
  Murtagh clustering) was that it performed rather rapidly.? Is there the
  intention to expose this functionality in knime (or is the functionality
  already there and I just don't know how?)
 It's not there yet, but it sure would be useful if the knime
 implementation were faster. I don't think it makes sense to use the
 RDKit implementation directly, but it may be possible to do a port of
 the Murtagh algorithm to java.  Thorsten? What do you think?
 I have to confess that I have never heard of the Murtaugh algorithm but
 it should be possible to port it to Java.

There's a fortran implementation here:
http://www.classification-society.org/csna/mda-sw/hc.f
It will probably make your eyes burn to read it, but it's at least short. :-)

 On the other hand, 4000 rows should not take that long in KNIME. How
 much times does it currently take?

I just did 1000 rows on my macbook. Assuming I'm reading the knime log
correctly, that took about a minute.

-greg

--
Increase Visibility of Your 3D Game App  Earn a Chance To Win $500!
Tap into the largest installed PC base  get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Beta of RDKit knime nodes available

2010-11-13 Thread Greg Landrum
Dear all,

I announced this at Goslar but just realized I hadn't posted to the
mailing list:
We've recently been doing some work with the guys at knime.com to
develop some RDKit-based nodes that add basic cheminformatics
functionality to knime. A beta version of these nodes is available in
a zipped update site here:
http://labs.knime.org/update/org.rdkit.0.9.0.zip

You can install these directly into knime using its Update Manager.
Note that you do *not* need an RDKit install to use the knime nodes.
They should work out of the box on 32 bit windows systems, 32 and 64
bit linux systems, and 64 bit mac systems (though here you will need
to use a beta version of knime).

Current functionality includes:
- Conversion to/from RDKit molecules
- generation of canonical smiles
- fingerprinting
- substructure filtering
- chemical reactions

The plan is to polish these nodes over the next couple of weeks, maybe
add one or two more pieces of key functionality, and have everything
ready for a release in early December.

Please give the nodes a try and let me know what you think or if you
have suggestions for improvements.

Many thanks to Thorsten and Bernd at knime.com who made this all possible.

Best Regards,
-greg

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss