Paolo,
Exactly what I was looking for. Very helpful. Thank you.
Regards,
Jim Metz
-----Original Message-----
From: Paolo Tosco <[email protected]>
To: James T. Metz <[email protected]>; greg.landrum <[email protected]>;
rdkit-discuss <[email protected]>
Sent: Mon, Sep 11, 2017 2:53 pm
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
Hi Jim,
you can indeed enumerate all Kekulè structures for a molecule within the
RDKit using Chem.ResonanceMolSupplier():
from rdkit import Chem
mol = Chem.MolFromSmiles('c1ccccc1')
suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
len(suppl)
2
for i in range(len(suppl)):
print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))
C1C=CC=CC=1
C1=CC=CC=C1
Best,
Paolo
On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:
Greg,
Thanks! Yes, very helpful. I will need to digest the detailed
information
you have provided. I am somewhat familiar with recursive SMARTS.
Thanks
again.
Regards,
Jim Metz
-----Original Message-----
From: Greg Landrum <[email protected]>
To: James T. Metz <[email protected]>
Cc: RDKit Discuss <[email protected]>
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule
structures
On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz
<[email protected]> wrote:
Greg,
I need to be able to use SMARTS patterns to
identify substructures in molecules
that can be aromatic, and I need to be able to
handle cases where there can be
differences in the way that the molecule was entered
or drawn by a user.
That particular problem is a big part of the reason
that we tend to use the aromatic representation of
things.
For example, consider the following
alkenyl-substituted pyridine, there
are two possible Kekule structures
m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'
Fixing what I assume is a typo for m2, I can do the
following:
In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')
In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')
In [13]: q1 = Chem.MolFromSmarts('cccc')
In [14]: q2 = Chem.MolFromSmarts('cccn')
In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]
In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]
In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]
In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]
Those particular queries were going for the aromatic
species and will only match inside the ring, but if you
want to be more generic you could tune your queries like
this:
In [28]: q3
=Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')
In [29]: q4
=Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')
In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]
In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]
In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]
In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]
If you aren't familiar with recursive SMARTS, this
construct: "[#6;$([#6]=,:[*])]" means "a carbon that has
either a double bond or an aromatic bond to another
atom". So you can interpret q3 as "four carbons that
each have either a double or aromatic bond and that are
connected to each other by single, double, or
aromatic bonds".
Is this starting to approximate what you're looking for?
-greg
Now consider two SMARTS
pattern1 = '[C]=[C]-[C]={C]
pattern2 = '[C]=[C]-[C]=[N]'
I need to be able to detect the existence of
each pattern in the molecule
If m1 is the only available generated Kekule
structure, then pattern2 will be recognized.
If m2 is the only available generated Kekule
structure, then pattern1 will be recognized.
Hence, I am getting different answers for the
same input molecule just because
it was drawn in different Kekule structures.
Regards,
Jim Metz
-----Original Message-----
From: Greg Landrum <[email protected]>
To: James T. Metz <[email protected]>
Cc: RDKit Discuss
<[email protected]>
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to
output multiple Kekule structures
Hi Jim,
The code currently has no way to
enumerate Kekule structures. I
don't recall this coming up in
the past and, to be honest, it
doesn't seem all that generally
useful.
Perhaps there's an alternate way to
solve the problem; what are you trying
to do?
-greg
On Mon, Sep 11, 2017 at 5:04 PM,
James T. Metz via
Rdkit-discuss <[email protected]>
wrote:
Hello,
Suppose I read in an
aromatic SMILES e.g., for benzene
c1ccccc1
I would like to generate the
major canonical resonance
forms
and save the results as two
separate molecules. Essentially
I am trying to generate
m1 = 'C1=CC=CC-C1'
m2 = 'C1C=CC=CC1'
Can this be done in RDkit?
I have found a KEKULE_ALL
option in the detailed
documentation which seems to be
what I
am trying to do, but I don't
understand how this option is
to be used,
or the proper syntax.
If it is necessary
to somehow renumber
the atoms and
re-generate
Kekule structures, that is
OK. Thank you.
Regards,
Jim Metz
------------------------------------------------------------------------------
Check out the vibrant tech
community on one of the
world's most
engaging tech sites,
Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Rdkit-discuss mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

