Hi John, thanks a lot for your detailed answer, helped me to understand the CDK a bit better. For now I'll try my luck with option 2)
Best regards, Martin -- Dipl-Inf. Martin Gütlein Phone: +49 (0)6131 39 23336 (office) +49 (0)177 623 9499 (mobile) Email: [email protected] Am 02.12.2014 20:47 schrieb John May: > Just to clarify you can write SMILES in CDK you’re writing SMILES > and then interpreting this as SMARTS. CDK doesn’t have the ability > to write a SMARTS. As well as hydrogens you may also have trouble with > aromaticity, charges, and isotopes. > >>> c(c[cH])c[cH] > > Is probably better as [#6]([#6][#6])[#6][#6]. > > The reason you’re having trouble in CDK 1.5 is SMILES IO now > correctly handles the valence. > > Anyways, There are a couple of solutions > > 1) reset the hydrogen counts to default (i.e atom typing) this will > work for your examples but will also mean you would lose aromaticity > flags (i.e. the example above isn’t a ring) and this wouldn’t fix > nitrogens which also have H displayed when aromatic. I would not > recommend this. > 2) set all hydrogen counts to 0 (not null!) before generating the > SMILES you may also want to do charge and mass. Simply loop over the > MCS and set the implicitH count to 0. removeHydrogens has no effect > because they’re not explicit - > http://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/ > [3]. > 3) after parsing the SMILES as SMARTS, traverse the expression tree of > each atom and replace the And(<OtherSmartsAtom>, HydrogenCount) with > <OtherSmartsAtom>. > 4) load the SMILES as a SMILES and do a normal subgraph match opposed > to SMARTS. > > Also > - make sure you use the new SMSD (not part of CDK) the CDK packages > are quite old > - avoid using the DefaultChemObjectBuilder and use > SilentChemObjectBuilder (the naming is the wrong way round but > actually Silent is better as it doesn’t fire off events). > - you’re generating canonical SMILES when this isn’t needed use > SmilesGenerator.generic().aromatic() when creating the > SmilesGenerator. > > J > > On Dec 2, 2014, at 11:04 AM, Martin Gütlein <[email protected]> > wrote: > >> Hi, >> >> any help with this issue would be very much appreciated, >> >> Kind regards, >> Martin >> >> -------- Originalnachricht -------- >> Betreff: Re: how to print SMARTS pattern without hydrogens >> Datum: 02.12.2014 12:00 >> On 30 September 2014 at 09:30, Martin Guetlein >> <[email protected]> wrote: >> >>> Hi, >>> >>> I am currently migrating from cdk1.4 to 1.5. I am mining the >>> maximum >>> common subgraph of two compounds and then print the resulting >>> fragment >>> as SMARTS. This is working in 1.4, however in 1.5 the >>> SmilesGenerator >>> is adding unwanted Hydrogens. How can I get rid of the Hydrogens? >>> See example below. >>> See also >>> >> > https://www.mail-archive.com/[email protected]/msg02597.html >>> [1] >>> >>> Thanks and kind regards, >>> Martin >>> >>> The following code prints "mcs: c(c[cH])c[cH]" instead of "mcs: >>> ccccc" >>> [[ >>> SmilesParser sp = new >>> SmilesParser(DefaultChemObjectBuilder.getInstance()); >>> IAtomContainer mol1 = sp.parseSmiles("c1ccccc1NC"); >>> IAtomContainer mol2 = sp.parseSmiles("c1cccnc1"); >>> org.openscience.cdk.smsd.Isomorphism mcsFinder = new >>> org.openscience.cdk.smsd.Isomorphism( >>> org.openscience.cdk.smsd.interfaces.Algorithm.DEFAULT, true); >>> mcsFinder.init(mol1, mol2, true, true); >>> mcsFinder.setChemFilters(true, true, true); >>> >>> mol1 = mcsFinder.getReactantMolecule(); >>> IAtomContainer mcsmolecule = >>> >> > DefaultChemObjectBuilder.getInstance().newInstance(IAtomContainer.class, >>> mol1); >>> List<IAtom> atomsToBeRemoved = new ArrayList<IAtom>(); >>> for (IAtom atom : mcsmolecule.atoms()) >>> { >>> int index = mcsmolecule.getAtomNumber(atom); >>> if (!mcsFinder.getFirstMapping().containsKey(index)) >>> atomsToBeRemoved.add(atom); >>> } >>> for (IAtom atom : atomsToBeRemoved) >>> mcsmolecule.removeAtomAndConnectedElectronContainers(atom); >>> >>> // has no effect >>> // mcsmolecule = >>> AtomContainerManipulator.removeHydrogens(mcsmolecule); >>> >>> SmilesGenerator g = new SmilesGenerator().aromatic(); >>> System.out.println("mcs: " + g.create(mcsmolecule)); >>> ]] >>> >>> -- >>> Dipl-Inf. Martin Gütlein >>> Phone: >>> +49 (0)761 203 8442 (office) >>> +49 (0)177 623 9499 (mobile) >>> Email: >>> [email protected] >> >> > ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and >> Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & >> more >> Get technology previously reserved for billion-dollar corporations, >> FREE >> > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk >> [2] >> _______________________________________________ >> Cdk-user mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/cdk-user > > > > Links: > ------ > [1] > https://www.mail-archive.com/[email protected]/msg02597.html > [2] > http://pubads.g.doubleclick.net/gampad/clk?id=157005751&iu=/4140/ostg.clktrk > [3] > http://nextmovesoftware.com/blog/2013/02/27/explicit-and-implicit-hydrogens-taking-liberties-with-valence/ ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Cdk-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/cdk-user

