Hi Andrew, First off for the SMARTS matcher you can turn off the "prepare" or use the lower level APIs and work on the input aromaticity.
IChemObjectBuilder bldr = SilentChemObjectBuilder.getInstance(); SmartsPattern pat = SmartsPattern.create("C=CC=N"); pat.setPrepare(false); // turn off auto ring+arom perception IAtomContainer mol = new SmilesParser(bldr).parseSmiles("OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C(\\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 CHEMBL2369103"); Cycles.markRingAtomsAndBonds(mol); // we need to do this manually because System.err.println(pat.matchAll(mol).count()); I'm not sure how you got that output: Aromaticity.apply(Aromaticity.Model.Daylight, mol); System.err.println(new SmilesGenerator(SmiFlavor.Default + SmiFlavor.UseAromaticSymbols).create(mol)); Gives me: OCCO[P+]1(OCCO)n2c3ccc2c(-c4ccccc4)c5C=Cc(n5)c(-c6ccccc6)c7ccc(n71)c(-c8ccccc8)c9nc(c3-c%10ccccc%10)C=C9 On Tue, 24 Jun 2025 at 13:09, Andrew Dalke <da...@dalkescientific.com> wrote: > Hi all, > > Given a molecule, how do I generate a SMILES which reflects the internal > aromaticity used? > > I'm cross-comparing some work using RDKit with CDK. The differences appear > to be due to differences in aromaticity perception, as expected. > > I'm trying to figure out how to verify these differences. Consider the > following input SMILES: > > OCCO[P+]1(OCCO)n2c3ccc2/C(c2ccccc2)=C2/C=CC(=N2)/C(c2ccccc2)=c2/cc/c(n21)=C(\c1ccccc1)C1=NC(=C3c2ccccc2)C=C1 > CHEMBL2369103 > > and SMARTS: > > C=CC=N > > While the SMARTS seems like it would match the "C=CC(=N2)" in the SMILES, > toolkits of course can perceive their own aromaticity. > Testing with CDK Depict shows CDK perceives all four nitrogens as aromatic. > > A SMARTS which does match is C=C-c:n and using "a" for the SMARTS verifies > that all nitrogens are aromatic. > > I wanted to verify this by visual inspection of the SMILES. When I > generate the SMILES with the default flavor I get, as I should have > expected, a Kekule form: > > > C1=CC=C(C=C1)/C/2=C/3\\C=CC(=N3)C(=C4C=CC5=C(C6=CC=CC=C6)C7=NC(=C(C8=CC=CC=C8)C9=CC=C2N9[P+](N45)(OCCO)OCCO)C=C7)C%10=CC=CC=C%10 > > When I remembered to add UseAromaticSymbols to the flavor I get: > > > c1ccc(cc1)/C/2=C/3\C=CC(=N3)C(=c4ccc5=C(c6ccccc6)C7=NC(=C(c8ccccc8)c9ccc2n9[P+](n45)(OCCO)OCCO)C=C7)c%10ccccc%10 > > This shows two aromatic nitrogens and two aliphatic nitrogens, which I > expected four "n" terms. > > This SMILES contains "C=CC(=N3)" which I would expect to match the SMARTS > "C=CC=N", so I can't use this approach for manual verification. > > I didn't see any other relevant flavors to add. Is there something else I > should do? > > Cheers, > > Andrew > da...@dalkescientific.com > > > > > > _______________________________________________ > Cdk-user mailing list > Cdk-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/cdk-user >
_______________________________________________ Cdk-user mailing list Cdk-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/cdk-user