Hi Greg, Thanks for the detailed explanation. You are right that this is not a real molecule; it came from applying a user-supplied reaction SMARTS. (The reaction SMARTS was not the best-written perhaps, but that's tangential...). I normally sanitize the products and skip those that fail the sanitization, but in this case I was surprised when the sanitized molecule caused issues later while trying to compute descriptors.
I look forward to a fix, but in the meantime maybe I'll consider running SanitzeMol twice. :-) Best, Ivan On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum <greg.land...@gmail.com> wrote: > Hi Ivan, > > Short answer: I would not normally expect a second sanitization to fail if > the first succeeds, but your input SMILES is very odd and triggers a bug. > > This is an interesting edge case for the sanitization code because it > includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope > this came out of some computational process and isn't a "real" molecule. > You almost couldn't have picked a better example to highlight the situation > that's causing the problem here. Some form of congratulations are in order. > :-) > > Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1 > The fundamental problem is that atom 1 (the first nitrogen) has a valence > of 4 and is neutral... > If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization > process produces, I don't think you'd be surprised that the RDKit > sanitization fails (and your second call to sanitize does fail). > > To understand why it passes the first time, you need to understand the > flow of the sanitization process, described here; > https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization > Step 3, updatePropertyCache(), is the part that reports valency errors. > There's a special case in this code for aromatic atoms that allows atoms > like the N in Cn1cccc1 to pass sanitization even though they are formally > four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is > triggering that special case because atom 1 is aromatic in the input > SMILES. Incorrect aromatic rings that get through this step normally end up > getting caught later when the molecule is kekulized (step 5). In your case > there are no aromatic bonds to kekulize, so no error is thrown. The > aromaticity perception (step 6) does not consider the ring to be aromatic, > so the final molecule is the equivalent of C1=N(C)C=CN1. > > It ought to be possible to clear this in the sanitization code relatively > easily; I just need to think about it a bit and do a bunch of testing. > > -greg > > > > > > > > > On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman < > ivan.tubert-broh...@schrodinger.com> wrote: > >> Hi, >> >> I was surprised to see that a (dubious) structure that goes through >> SanitizeMol OK can fail a subsequent sanitization call: >> >> print("Start") >> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False) >> print("Before first sanitization") >> Chem.SanitizeMol(mol) >> print("Before second sanitization") >> Chem.SanitizeMol(mol) >> print("Done") >> >> >> The output is: >> >> Start >> Before first sanitization >> Before second sanitization >> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted >> Traceback (most recent call last): >> File "./san.py", line 9, in <module> >> Chem.SanitizeMol(mol) >> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is >> greater than permitted >> >> >> Is this an unavoidable aspect of the way SanitizeMol works, since it does >> several operations (Kekulize, check valencies, set aromaticity, conjugation >> and hybridization) in a certain order, or should this be considered a bug? >> >> Best, >> Ivan >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss