Dear Peter and Paolo, Wahoo! many thanks to both of you for having researched that much on this issue. No worries, I can live with a kekulé form of my smiles1! I only noticed that strange behaviour when I fragmented a dihydro imidazo pyridine derivative of smiles1 and saw that some of its fragments had lost aromaticity and were no longer a substructure match for their parent...
Best, Alexis On Sat, 28 Nov 2020 at 03:49, Peter S. Shenkin <shen...@gmail.com> wrote: > Yes, I've seen the same phenomenon in multiple SMILES generators. > > Even Daylight's (when they had it up on a public web site). > > From a chemical perspective, it isn't sensible that the pyridone-like ring > in molecule 1 should not be seen as aromatic in the canonical > SMILES, especially since the same ring is seen as aromatic in molecule2. > The counter-argument has often been that "only exocyclic substituents are > considered". But of course that =N is indeed exocyclic to the ring in > question. > > In a famous quote, Dave Weininger said: > > It is important to remember that the purpose of the SMILES aromaticity > detection algorithm is for the purposes of chemical information > representation only! To this end, rigorous rules are provided for > determining the "aromaticity" of charged, heterocyclic, and electron > deficient ring systems. The"aromaticity" designation as used here is not > intended to imply anything about the reactivity, magnetic resonance > spectra, heat of formation, or odor of substances. > > As an example of the utility of this definition, consider o-xylene. You > don't want to see the VB structure with a double bond connecting the > methyl-attached carbons as different from the form with a single bond in > that position. Hence, aromaticity enables SMILES to avoid that issue, since > the (canonical) SMILES does not contain any double bonds, but only aromatic > bonds within the ring. > > And the fact is that there is no ambiguity in any of the structures I've > seen (including the one shown SMILES1) that exhibit the problem. There's > only one way to draw the resonance structure, anyway, so you could argue > that you don't need to make it aromatic at all. > > Of course, if you had the courage of that particular conviction, you > wouldn't bother making pyrrole aromatic, either, because there's only one > resonance structure you can draw. But SMILES does define pyrrole as > aromatic. > > When I've discussed this with developers who have worked on SMILES > systems, they say that looking for cases like exocyclic > aromaticity-producing substituents in adjacent non-aromatic rings would > slow the SMILES generator down. > > But the problem is that when you are using a SMARTS to look for one of > these pyridone-like rings that you see in the first structure, you're not > going to find it, even though it's there. Chemists do expect an aromatic > SMARTS to find an aromatic ring, which is no doubt the secret reason for > making pyrrole aromatic. > > I've never liked this situation, but it boils down to the fact that > Daylight, which produced the original reference SMILES implementation, > "done it that-a-way". It has the advantage of *stare decisis*. > > -P. > > P. S. By the way, if any of you have ever seen a SMILES generator that > displays the 6-membered ring as aromatic in the first example, could you > please tell us which one that is? > > On Fri, Nov 27, 2020 at 1:55 PM Paolo Tosco <paolo.tosco.m...@gmail.com> > wrote: > >> (Now with link - you can tell it's Friday night) >> >> Hi Mark, Alexis, >> >> Yes, I was too fast in composing my previous reply and I did not pay >> enough attention to the molecules. >> After reading Alexis' reply, I looked more carefully at his original >> question and at that point I remembered having seen a similar behaviour >> before from RDKit on condensed ring systems featuring exocyclic bonds and >> relative mailing list discussions. >> So I did a bit of searching and I fished out the (long) thread that deals >> with exactly this behaviour. >> >> >> https://sourceforge.net/p/rdkit/mailman/rdkit-discuss/thread/CAAsqebGxOwJtH32T5jC%3DoBZN6G1JE_NwsEqKUO8%2BmUCqmABCzQ%40mail.gmail.com/#msg36448625 >> >> I hope that helps, cheers >> p. >> >> On Fri, Nov 27, 2020 at 7:31 PM Mark Mackey <m...@cresset-group.com> >> wrote: >> >>> Hi Paolo, >>> >>> >>> >>> Hmmm, I think this is displaying a bug (or at the very least unexpected >>> behaviour) in the aromaticity code. The issue isn’t the aromaticity of the >>> imidazole/dihydroimidazole, but the aromaticity of the pyridyl. Alexis’ >>> second molecule is identical to the first except that one bond in the >>> 5-membered ring was broken, and that (to my eyes at least) should not >>> affect whether the 6-membered ring is seen as aromatic. >>> >>> >>> >>> Regards, >>> >>> Mark. >>> >>> >>> >>> *From:* Paolo Tosco <paolo.tosco.m...@gmail.com> >>> *Sent:* 27 November 2020 17:04 >>> *To:* Alexis Parenty <alexis.parenty.h...@gmail.com> >>> *Cc:* RDKit Discuss <rdkit-discuss@lists.sourceforge.net> >>> *Subject:* Re: [Rdkit-discuss] canonicalization of two aromatic >>> molecules returning two different forms (kekule and aromatic) >>> >>> >>> >>> Hi Alexis, >>> >>> >>> >>> The second molecule (smiles2) is indeed aromatic, but the first (smiles1) >>> is not, as the imidazole ring condensed to the pyridine is partially >>> saturated. >>> >>> The smiles1a analogue where I have added a double bond is aromatic, and >>> upon canonicalization it yields an aromatic SMILES as expected. >>> >>> >>> >>> Cheers, >>> >>> p. >>> >>> >>> >>> *from* rdkit *import* Chem >>> >>> In [2]: >>> >>> mol1 *=* Chem*.*MolFromSmiles("N12C=CC=CC1=NCC2") >>> >>> In [3]: >>> >>> mol1 >>> >>> Out[3]: >>> >>> In [4]: >>> >>> smiles1 *=* Chem*.*MolToSmiles(mol1) >>> >>> In [5]: >>> >>> smiles1 >>> >>> Out[5]: >>> >>> 'C1=CC2=NCCN2C=C1' >>> >>> In [6]: >>> >>> mol2 *=* Chem*.*MolFromSmiles("CN=C1C=CC=CN1C") >>> >>> In [7]: >>> >>> mol2 >>> >>> Out[7]: >>> >>> In [8]: >>> >>> smiles2 *=* Chem*.*MolToSmiles(mol2) >>> >>> In [9]: >>> >>> smiles2 >>> >>> Out[9]: >>> >>> 'CN=c1ccccn1C' >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> In [10]: >>> >>> mol1a *=* Chem*.*MolFromSmiles("N12C=CC=CC1=NC=C2") >>> >>> In [11]: >>> >>> mol1a >>> >>> Out[11]: >>> >>> In [12]: >>> >>> smiles1a *=* Chem*.*MolToSmiles(mol1a) >>> >>> In [13]: >>> >>> smiles1a >>> >>> Out[13]: >>> >>> 'c1ccn2ccnc2c1' >>> >>> >>> >>> On Fri, Nov 27, 2020 at 5:09 PM Alexis Parenty < >>> alexis.parenty.h...@gmail.com> wrote: >>> >>> Hi everyone, >>> >>> >>> >>> Why is it that when I canonicalize the following smiles_1 I get its >>> unexpected kekule form, whereas when I canonicalize a similar smiles_2, I >>> get its expected aromatic form? >>> >>> >>> >>> From rdkit import Chem >>> >>> smiles1 = Chem.CanonSmiles("N12C=CC=CC1=NCC2") >>> smiles >>> >>> ==> 'C1=CC2=NCCN2C=C1' >>> >>> >>> >>> smiles2 = Chem.CanonSmiles("CN=C1C=CC=CN1C") >>> smiles2 >>> >>> ==> 'CN=c1ccccn1C' >>> >>> >>> >>> I would like to get the aromatic form in both cases... Is there a way to >>> force the aromatic form? >>> >>> >>> >>> Best, >>> >>> Alexis >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss