I agree that there are plenty of implementations of clustering, machine
learning, etc.  It would be better for the RDKit developers to focus on
cheminformatics.   This being said, there are some opportunities for domain
specific performance enhancement.  One of the slow steps in many clustering
algorithms is the calculation of a distance matrix and identification of
neighbors.  If you're clustering fingerprints, I'd recommend looking at Andrew
Dalke's ChemFP <http://chemfp.com/>.  Andrew has applied a multitude of
tricks that can make clustering blazingly fast.   The ChemFP examples
include an implementation of Taylor-Butina clustering.  Even better, ChemFP
works "out of the box" with the RDKit.

Pat



On Mon, Feb 23, 2015 at 7:02 AM, Maciek Wójcikowski <mac...@wojcikowski.pl>
wrote:

> Hello,
>
> If interested in clustering in python I can recommend, as usual, sklearn:
> http://scikit-learn.org/stable/modules/clustering.html
> It's pretty much all you should need. Have fun!
>
> ----
> Pozdrawiam,  |  Best regards,
> Maciek Wójcikowski
> mac...@wojcikowski.pl
>
> 2015-02-23 11:43 GMT+01:00 Anthony Bradley <anthony.brad...@worc.ox.ac.uk>
> :
>
>>   Hi Anthony,
>>
>>
>>
>> On Sun, Feb 22, 2015 at 11:03 AM, Anthony Bradley <
>> anthony.brad...@worc.ox.ac.uk> wrote:
>>
>> Hi all,
>>
>> I am currently working with RDKit from the Java API (well jython
>> actually).
>>
>> As has been discussed most of the documentation for this is found by
>> trawling:
>>
>> Code/JavaWrappers/gmwrapper/src-test/org/RDKit/
>> and
>> Code/JavaWrappers/gmwrapper/src/org/RDKit/
>>
>> However I'm trying to perform a simple clustering. I can build my
>> distance matrix - but I can't see where the actual clustering algorithms
>> live.
>>
>> It may well be my grepping skills are not what they should be!
>>
>>
>>
>> No need to have any concerns about your skills with grep, the clustering
>> functionality is not exposed via the SWIG wrappers. As currently configured
>> the code isn't available as a library, it's really only useable from
>> python. It's a medium-sized amount of work to convert this to a library, so
>> it's doable, but I'm not sure it's worth it.
>>
>>
>>
>> That seems fair enough and there are definitely other options out there.
>> It was more of method consistency thing – so I could be using the same code
>> from the python / jython side.
>>
>>
>>
>> I've been assuming that there are high(er) quality replacements available
>> for most of the RDKit "machine learning" functionality. Since it's somewhat
>> removed from the "cheminformatics" focus, I haven't really put any time
>> into that code in the past few years. Does this sound wrong to anyone? Any
>> arguments that the clustering code is worth investing some time in?
>>
>>
>>
>> Unless anybody else is interested – I can see why it would be low
>> priority!
>>
>>
>>
>> -greg
>>
>>
>>
>> Thanks a lot for responding so quickly and effectively!
>>
>>
>>
>> Best,
>>
>>
>>
>> Anthony
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to