Hi Greg,

Thanks for the detailed explanation. You are right that this is not a real
molecule; it came from applying a user-supplied reaction SMARTS. (The
reaction SMARTS was not the best-written perhaps, but that's
tangential...). I normally sanitize the products and skip those that fail
the sanitization, but in this case I was surprised when the sanitized
molecule caused issues later while trying to compute descriptors.

I look forward to a fix, but in the meantime maybe I'll consider running
SanitzeMol twice. :-)

Best,
Ivan


On Wed, Oct 31, 2018 at 2:41 AM Greg Landrum <greg.land...@gmail.com> wrote:

> Hi Ivan,
>
> Short answer: I would not normally expect a second sanitization to fail if
> the first succeeds, but your input SMILES is very odd and triggers a bug.
>
> This is an interesting edge case for the sanitization code because it
> includes a weird mix of aromatic and aliphatic atoms and bonds, I do hope
> this came out of some computational process and isn't a "real" molecule.
> You almost couldn't have picked a better example to highlight the situation
> that's causing the problem here. Some form of congratulations are in order.
> :-)
>
> Here's an explanation of what's going on with your molecule C1=n(C)-c=Cn1
> The fundamental problem is that atom 1 (the first nitrogen) has a valence
> of 4 and is neutral...
> If you wrote the SMILES as C1=N(C)C=CN1, which is what the sanitization
> process produces, I don't think you'd be surprised that the RDKit
> sanitization fails (and your second call to sanitize does fail).
>
> To understand why it passes the first time, you need to understand the
> flow of the sanitization process, described here;
> https://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization
> Step 3, updatePropertyCache(), is the part that reports valency errors.
> There's a special case in this code for aromatic atoms that allows atoms
> like the N in Cn1cccc1 to pass sanitization even though they are formally
> four-valent (2x1.5 for the aromatic bonds +1 for the C). Your molecule is
> triggering that special case because atom 1 is aromatic in the input
> SMILES. Incorrect aromatic rings that get through this step normally end up
> getting caught later when the molecule is kekulized (step 5). In your case
> there are no aromatic bonds to kekulize, so no error is thrown. The
> aromaticity perception (step 6) does not consider the ring to be aromatic,
> so the final molecule is the equivalent of C1=N(C)C=CN1.
>
> It ought to be possible to clear this in the sanitization code relatively
> easily; I just need to think about it a bit and do a bunch of testing.
>
> -greg
>
>
>
>
>
>
>
>
> On Tue, Oct 30, 2018 at 10:02 PM Ivan Tubert-Brohman <
> ivan.tubert-broh...@schrodinger.com> wrote:
>
>> Hi,
>>
>> I was surprised to see that a (dubious) structure that goes through
>> SanitizeMol OK can fail a subsequent sanitization call:
>>
>> print("Start")
>> mol = Chem.MolFromSmiles('C1=n(C)-c=Cn1', sanitize=False)
>> print("Before first sanitization")
>> Chem.SanitizeMol(mol)
>> print("Before second sanitization")
>> Chem.SanitizeMol(mol)
>> print("Done")
>>
>>
>> The output is:
>>
>> Start
>> Before first sanitization
>> Before second sanitization
>> [16:54:20] Explicit valence for atom # 1 N, 4, is greater than permitted
>> Traceback (most recent call last):
>>   File "./san.py", line 9, in <module>
>>     Chem.SanitizeMol(mol)
>> ValueError: Sanitization error: Explicit valence for atom # 1 N, 4, is
>> greater than permitted
>>
>>
>> Is this an unavoidable aspect of the way SanitizeMol works, since it does
>> several operations (Kekulize, check valencies, set aromaticity, conjugation
>> and hybridization) in a certain order, or should this be considered a bug?
>>
>> Best,
>> Ivan
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to