Re: [Cdk-user] How identify douplicate molecules???

Christoph Steinbeck Thu, 02 Jul 2009 07:42:49 -0700

On 2 Jul 2009, at 13:29, suyog wrote:

hi... all
       I have one .sdf file with thousands of molecule in it...
I want to identify duplicate molecules(having similar structure)from .sdf
file using CDK...


Do you mean similar or identical (isomorphic graphs).
Both would require different code.

For identical molecules, which I think is what you mean, you could usevarious strategies.I believe that the easiest is to iterate over the molecules with theiterative SDF parser, read the molecule, make a canonical SMILES, putthat in a unique set.For any subsequent molecule you can then check if the canonical SMILESis already in the set and discard this molecules.

How do I do this????
Which classes should i use??
Or if you have any sample code please paste link....



The CDK tests are a good start for example code.

http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/src/test/org/openscience/cdk/has the tests for the various classes you'll need, with example code.


Cheers,

Chris

--
Dr. Christoph Steinbeck
Head of Chemoinformatics and Metabolism
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD UK
Phone +44 1223 49 2640

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..

------------------------------------------------------------------------------

_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: [Cdk-user] How identify douplicate molecules???

Reply via email to