Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Start with your benzene molecule m = Chem.MolFromSmiles('c1c1') make a pattern using Peter's example, with three aromatic atoms connected by three aromatic bonds patt = Chem.MolFromSmarts('a:a:a') and it's a match: m.HasSubstructMatch(patt) >True Kekulize your mol, and the pattern doesn't match Chem.rdmolops.Kekulize(m) m.HasSubstructMatch(patt) >False but if you change the smarts pattern to match aromatic atoms connected by kekulized bonds, it matches patt2 = Chem.MolFromSmarts('[a]=[a]-[a]') m.HasSubstructMatch(patt2) >True Your original SMARTS query doesn't match, because C in a smarts string is specifically an aliphatic carbon. Change it to c and it will match. It would work, if you had removed the aromatic flags when kekulizing m = Chem.MolFromSmiles('c1c1') Chem.rdmolops.Kekulize(m, clearAromaticFlags = True) patt = Chem.MolFromSmarts('[C]=[C]-[C]') m.HasSubstructMatch(patt) >True So when you kekulize, without using the clearAromaticFlags option, then aromatic atoms will still only match 'a', not 'A', but the bonds will only match '=' or '-', but not ':' (they will also match '@' or '~', but that's beside the point here) As Peter mentions, by default if you read in a kekulized SMILES string, the mol you create will not be kekulized, but it sounds like you are intentionally kekulizing before doing substructure matching. Jason Biggs On Fri, Sep 8, 2017 at 5:19 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I read in the SMILES of an aromatic molecule e.g., for > benzene > > c1c1 > > I then want to convert the molecule to a Kekule representation and > then perform various SMARTS pattern recognition e.g. > > [C]=[C]-[C] > > I have tried various Kekule commands in RDkit, but I can not figure > out how to (or if it is possible) to recognize a SMARTS pattern for > a portion of a molecule which is aromatic, but is currently being > stored as a Kekule structure. > > Also, is it possible to generate and store more than one Kekule > form in RDkit? > > Thank you. > > Regards, > Jim Metz > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Hi, In SMARTS, 'a' matches an aromatic atom. So you would match your molecule with the pattern 'aaa', or if you wanted to restrict yourself to carbons, 'ccc'. This would match whether you created the molecule from a Kekulized or an aromatic SMILES. Remember that it's the molecular recognition code, not the form of the input SMILES, that determines whether a molecule is aromatic. -P. On Fri, Sep 8, 2017 at 6:19 PM, James T. Metz via Rdkit-discuss < rdkit-discuss@lists.sourceforge.net> wrote: > Hello, > > Suppose I read in the SMILES of an aromatic molecule e.g., for > benzene > > c1c1 > > I then want to convert the molecule to a Kekule representation and > then perform various SMARTS pattern recognition e.g. > > [C]=[C]-[C] > > I have tried various Kekule commands in RDkit, but I can not figure > out how to (or if it is possible) to recognize a SMARTS pattern for > a portion of a molecule which is aromatic, but is currently being > stored as a Kekule structure. > > Also, is it possible to generate and store more than one Kekule > form in RDkit? > > Thank you. > > Regards, > Jim Metz > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules
Hello, Suppose I read in the SMILES of an aromatic molecule e.g., for benzene c1c1 I then want to convert the molecule to a Kekule representation and then perform various SMARTS pattern recognition e.g. [C]=[C]-[C] I have tried various Kekule commands in RDkit, but I can not figure out how to (or if it is possible) to recognize a SMARTS pattern for a portion of a molecule which is aromatic, but is currently being stored as a Kekule structure. Also, is it possible to generate and store more than one Kekule form in RDkit? Thank you. Regards, Jim Metz -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using Chem.WrapLogs()
Thanks Maciek, Both of those solutions works on Linux, which is fine for my purposes. Neither works on Windows (let me know if you want me to file a bug). Regards, - Noel On 8 September 2017 at 15:05, Maciek Wójcikowski wrote: > Hi Noel, > > sio.seek(0) before assert or sio.getvalue() instead read(). > > > Pozdrawiam, | Best regards, > Maciek Wójcikowski > mac...@wojcikowski.pl > > 2017-09-08 15:51 GMT+02:00 Noel O'Boyle : > >> Hi all, >> >> I'd like to capture error messages during SMILES parsing, but am having >> trouble getting this to work. >> >> The following code raises an AssertionError, for example. Is there >> something here I'm missing? I'm using this from a Windows 7 conda >> environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda >> environment is also failing for me on Linux. >> >> import sys >> from rdkit import Chem >> Chem.WrapLogs() >> from StringIO import StringIO >> >> old_stderr = sys.stderr >> sio = sys.stderr = StringIO() >> >> mol = Chem.MolFromSmiles("c1c") >> sys.stderr = old_stderr >> >> assert sio.read() != "" >> >> Regards, >> - Noel >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using Chem.WrapLogs()
Hi Noel, sio.seek(0) before assert or sio.getvalue() instead read(). Pozdrawiam, | Best regards, Maciek Wójcikowski mac...@wojcikowski.pl 2017-09-08 15:51 GMT+02:00 Noel O'Boyle : > Hi all, > > I'd like to capture error messages during SMILES parsing, but am having > trouble getting this to work. > > The following code raises an AssertionError, for example. Is there > something here I'm missing? I'm using this from a Windows 7 conda > environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda > environment is also failing for me on Linux. > > import sys > from rdkit import Chem > Chem.WrapLogs() > from StringIO import StringIO > > old_stderr = sys.stderr > sio = sys.stderr = StringIO() > > mol = Chem.MolFromSmiles("c1c") > sys.stderr = old_stderr > > assert sio.read() != "" > > Regards, > - Noel > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Using Chem.WrapLogs()
On Sep 8, 2017, at 15:51, Noel O'Boyle wrote: > > Hi all, > > I'd like to capture error messages during SMILES parsing, but am having > trouble getting this to work. ... > assert sio.read() != "" That should be a sio.getvalue(). The read() starts from the current file position, which is at the end of the previous output. (Or if you really want a read(), do sio.seek(0) first.) Cheers, Andrew da...@dalkescientific.com -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Debian Stretch Python3 Does Not Find RDKit
apologies for the slow reply; I'm still getting caught up from my vacation. If you start the system python from the command line, can that find the rdkit? You can test this as follows: python -c 'from rdkit import Chem' if that works, you know that the installation worked and that the problem is with spyder (that's harder for me to help with, but if you google for rdkit and spyder you might find some helpful answers). If the above doesn't work, then we can start trying to diagnose what went wrong with the install, Please start with: which python to make sure that you are in fact using the system python. -greg On Sat, Aug 19, 2017 at 4:05 PM, Stephen P. Molnar wrote: > I have installed the Debian Stretch distribution Spyder3 and RDKit on my > 64 bit Linux platform. > > There were no warning or error messages during the istallation process. > > However, when I attempted running a cookbook Python script (file > attached), I got the following; > > Python 3.5.3 (default, Jan 19 2017, 14:11:04) > Type "copyright", "credits" or "license" for more information. > > IPython 6.1.0 -- An enhanced Interactive Python. > > runfile('/home/comp/Apps/Python/untitled0.py', > wdir='/home/comp/Apps/Python') > Traceback (most recent call last): > > File "", line 1, in > runfile('/home/comp/Apps/Python/untitled0.py', > wdir='/home/comp/Apps/Python') > > File > "/usr/local/lib/python3.5/dist-packages/spyder/utils/site/sitecustomize.py", > line 688, in runfile > execfile(filename, namespace) > > File > "/usr/local/lib/python3.5/dist-packages/spyder/utils/site/sitecustomize.py", > line 101, in execfile > exec(compile(f.read(), filename, 'exec'), namespace) > > File "/home/comp/Apps/Python/untitled0.py", line 11, in > from rdkit import Chem > > ImportError: No module named 'rdkit' > > I would greatly appreciate pointers towards a solution to this problem. > > Thanks in advance. > > -- > Stephen P. Molnar, Ph.D.Life is a fuzzy set > www.molecular-modeling.net Stochastic and multivariate > (614)312-7528 (c) > Skype: smolnar1 > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Using Chem.WrapLogs()
Hi all, I'd like to capture error messages during SMILES parsing, but am having trouble getting this to work. The following code raises an AssertionError, for example. Is there something here I'm missing? I'm using this from a Windows 7 conda environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda environment is also failing for me on Linux. import sys from rdkit import Chem Chem.WrapLogs() from StringIO import StringIO old_stderr = sys.stderr sio = sys.stderr = StringIO() mol = Chem.MolFromSmiles("c1c") sys.stderr = old_stderr assert sio.read() != "" Regards, - Noel -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] GetConformerRMS() vs GetBestRMS()
Hi Anikó, Both functions do an alignment. The big difference here is coming because GetBestRMS() looks at all 2D-identical alignments of the molecules to each other while GetConformerRMS() only does the alignment once: using the atom numbers. Practically speaking what does that mean for your molecule? Here's a 2D sketch without the Hs: [image: Inline image 1] By 2D symmetry atoms 8 and 9 are equivalent as are atoms 4 and 5. So there are four possible 2D isomorphisms between those molecules : 8->8, 9->9, 4->4, 5->5 (all others the same) 8->9, 9->8, 4->4, 5->5 (all others the same) 8->8, 9->9, 4->5, 5->4 (all others the same) 8->9, 9->8, 4->5, 5->4 (all others the same) GetBestRMS() does alignments for all of these and takes the one that provides the lowest RMS value. GetConformerRMS() only does the first alignment and uses that RMS. In general you want to always use GetBestRMS() for symmetric molecules. Does that help? -greg p.s. Adding the Hs leads to additional mappings which just makes the overall problem worse. On Fri, Sep 8, 2017 at 9:26 AM, Udvarhelyi, Aniko < aniko.udvarhe...@novartis.com> wrote: > Dear All, > > > > I would like to compute RMS values between conformers of the same molecule > that are not aligned. Unfortunately, I can´t get along very well with the > GetConformerRMS() function, it gives far too high RMS values even for > conformers that are clearly (near-)identical as judged by visual inspection > after alignment. I attach one example of 2 conformers of a molecule, that > are near-identical. > > GetConformerRMS() returns an RMS value of 1.32 (with Hydrogens) and 0.70 > (disregarding Hydrogens). > > GetBestRMS() returns an RMS value of 0.03 (with Hydrogens) and 0.02 > (disregarding Hydrogens). > > > > Clearly, the GetBestRMS() result is the one I´d expect (I am interested > in the all-atom RMSDs with Hydrogens). I guess GetConformerRMS() cannot > align the two conformers properly hence the high RMS value. My question is > why not? The atom ordering and all bonds are exactly the same in both > conformers. Why do I need the GetBestRMS() alignment of all possible > permutations of matching atom orders in both conformers to get the > alignment correct? I would like to avoid using GetBestRMS()as it is far > too slow for my purposes (processing many molecules with many conformers). > > > > Many thanks for any hints, > > Anikó > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] GetConformerRMS() vs GetBestRMS()
Dear All, I would like to compute RMS values between conformers of the same molecule that are not aligned. Unfortunately, I can´t get along very well with the GetConformerRMS() function, it gives far too high RMS values even for conformers that are clearly (near-)identical as judged by visual inspection after alignment. I attach one example of 2 conformers of a molecule, that are near-identical. GetConformerRMS() returns an RMS value of 1.32 (with Hydrogens) and 0.70 (disregarding Hydrogens). GetBestRMS() returns an RMS value of 0.03 (with Hydrogens) and 0.02 (disregarding Hydrogens). Clearly, the GetBestRMS() result is the one I´d expect (I am interested in the all-atom RMSDs with Hydrogens). I guess GetConformerRMS() cannot align the two conformers properly hence the high RMS value. My question is why not? The atom ordering and all bonds are exactly the same in both conformers. Why do I need the GetBestRMS() alignment of all possible permutations of matching atom orders in both conformers to get the alignment correct? I would like to avoid using GetBestRMS()as it is far too slow for my purposes (processing many molecules with many conformers). Many thanks for any hints, Anikó confs.sdf Description: confs.sdf -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss