On 2 Jul 2009, at 13:29, suyog wrote:
hi... all
       I have one .sdf file with thousands of molecule in it...
I want to identify duplicate molecules(having similar structure) from .sdf
file using CDK...

Do you mean similar or identical (isomorphic graphs).
Both would require different code.

For identical molecules, which I think is what you mean, you could use various strategies. I believe that the easiest is to iterate over the molecules with the iterative SDF parser, read the molecule, make a canonical SMILES, put that in a unique set. For any subsequent molecule you can then check if the canonical SMILES is already in the set and discard this molecules.

How do I do this????
Which classes should i use??
Or if you have any sample code please paste link....


The CDK tests are a good start for example code.
http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/src/test/org/openscience/cdk/ has the tests for the various classes you'll need, with example code.

Cheers,

Chris

--
Dr. Christoph Steinbeck
Head of Chemoinformatics and Metabolism
European Bioinformatics Institute (EBI)
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD UK
Phone +44 1223 49 2640

What is man but that lofty spirit - that sense of enterprise.
... Kirk, "I, Mudd," stardate 4513.3..




------------------------------------------------------------------------------
_______________________________________________
Cdk-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/cdk-user

Reply via email to