Re: [Rdkit-discuss] Canonical smiles for medium and large rings?
James, On Tue, Jan 4, 2011 at 6:29 PM, James Davidson wrote: > >> It would be *really* useful to have some more real-world >> cases like this one to use as tests. So if you happen to have >> others you can send I would be quite happy to have them. > > On that note, I have added a comment to the bug tracker > (https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=16013 > 9&atid=814650) - but was not sure how to attach a file (eg sdf) there, > so apologies for it ending up on more lines than I intended... Also, I > logged in with my google account, but it looks like it may not be clear > who it is! Thanks for these. I just added a couple of initial tests based on them. I will try to find the time to make them a bit more comprehensive in the next couple of days. > The first two examples are two marine natural products that only differ > in the geometry of the double bond in the medium ring. The final > example is a cis- analogue that I synthesised during my PhD for which a > crystal structure was also obtained. The stereochemistry in these > systems is 'challenging' to say the least, so I thought they would make > reasonable test cases. I should say that even for the cis- double bond > cases, RDKit does a rather ugly job of the 2D depiction - but I am not > sure if other depictors will perform much better... Yeah, I'm afraid it's not going to do a reasonable job with the depiction of natural products. Most depictors (including many human ones) have trouble getting these rendered well. > On a related note, I was keen to manually double-check the > stereochemistry that had been assigned to each of the chiral centres > (particularly the ones involving the 9-5 ring connections - as these are > potentially troublesome), and found myself wishing there was a way to > easily label a 2D depiction of the molecules with the atom ID. What I > ended-up doing was the following: > > 1. Getting the R/S info + atomIdx back from RDKit (example output): Chem.FindMolChiralCenters(mol) > [(3, 'R'), (7, 'R'), (8, 'S'), (9, 'R'), (11, 'R'), (18, 'R'), (24, > 'R')] > 2. Opening the molfile in a program where I know how to label with atom > IDs (pymol) > 3. Check which atom is which manually (had to add 1 to the RDKit > atomIdx values as they start at 0) then double-check with reference > values. > > RDKit performed admirably - but I presume this is dependant on the > quality of the wedge info coming in from the SDF(?) If the data are read from an SDF, yes: the initial stereochem information comes from the SDF. If you have a 3D SD file, you can also have the RDKit ignore bond wedging and assign chirality based purely on coordinates. R/S assignments are done in a later step; it's always nice to hear that those are correct. for what it's worth: I tend to use Marvin Sketch for the "drawing molecules with atom indices to check up on stereochemistry" task. It will also assign absolute stereochem to atoms and bonds (usually correctly), so it's a useful check there too. Best regards, -greg -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Canonical smiles for medium and large rings?
Hi Greg, > On Sat, Dec 18, 2010 at 6:27 AM, Greg Landrum > wrote: > > I just checked in a set of changes that should get this > (mostly) working correctly. Here's a demonstration with Geldanamycin: > > In [7]: > smi=r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C > )C\C2=C(/OC)C(=O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O' > > In [8]: print Chem.CanonSmiles(smi) > COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)/C=C(\C)[...@h](OC(N > )=O)[C@@H](OC)/C=C\C=C(/C)C(=O)NC(=CC1=O)C2=O Thanks for looking into this so quickly! > It would be *really* useful to have some more real-world > cases like this one to use as tests. So if you happen to have > others you can send I would be quite happy to have them. On that note, I have added a comment to the bug tracker (https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=16013 9&atid=814650) - but was not sure how to attach a file (eg sdf) there, so apologies for it ending up on more lines than I intended... Also, I logged in with my google account, but it looks like it may not be clear who it is! The first two examples are two marine natural products that only differ in the geometry of the double bond in the medium ring. The final example is a cis- analogue that I synthesised during my PhD for which a crystal structure was also obtained. The stereochemistry in these systems is 'challenging' to say the least, so I thought they would make reasonable test cases. I should say that even for the cis- double bond cases, RDKit does a rather ugly job of the 2D depiction - but I am not sure if other depictors will perform much better... On a related note, I was keen to manually double-check the stereochemistry that had been assigned to each of the chiral centres (particularly the ones involving the 9-5 ring connections - as these are potentially troublesome), and found myself wishing there was a way to easily label a 2D depiction of the molecules with the atom ID. What I ended-up doing was the following: 1. Getting the R/S info + atomIdx back from RDKit (example output): >>> Chem.FindMolChiralCenters(mol) [(3, 'R'), (7, 'R'), (8, 'S'), (9, 'R'), (11, 'R'), (18, 'R'), (24, 'R')] 2. Opening the molfile in a program where I know how to label with atom IDs (pymol) 3. Check which atom is which manually (had to add 1 to the RDKit atomIdx values as they start at 0) then double-check with reference values. RDKit performed admirably - but I presume this is dependant on the quality of the wedge info coming in from the SDF(?) Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. __ -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Canonical smiles for medium and large rings?
On Sat, Dec 18, 2010 at 6:27 AM, Greg Landrum wrote: > > > For 'classic' aliphatic systems, double-bonds in > > 3-7-membered rings can only sensibly exist in the cis orientation, so > > 'ignoring' them would be ok. However, for 8-membered and above, cis or > > trans are certainly both possible, so it becomes more important to keep > > track - particularly if canonical smiles are being used to check for > > unique structures, as my colleague was doing with the geldanamycin > > example above. > > yeah, that's clear: for larger ring systems the information should be > preserved. That's very easy to do. The more difficult part is going to > be making sure the output is actually canonical. I've entered a bug > for this > (https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=160139&atid=814650) > and I'll take a look to try and get it fixed (and correct). I just checked in a set of changes that should get this (mostly) working correctly. Here's a demonstration with Geldanamycin: In [7]: smi=r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C)C\C2=C(/OC)C(=O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O' In [8]: print Chem.CanonSmiles(smi) COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)/C=C(\C)[...@h](OC(N)=O)[C@@H](OC)/C=C\C=C(/C)C(=O)NC(=CC1=O)C2=O At least according to Marvin, those two structures are the same. One very important caveat: I have not modified the depiction code to generate correct coordinates for trans bonds in cycles. All coordinates for ring systems still have all cis bonds. This has an impact if you write an SD (or mol) file : the stereochemistry captured in that file will be incorrect. I've entered a bug report for this (https://sourceforge.net/tracker/?func=detail&aid=3147014&group_id=160139&atid=814650) so that it doesn't get lost, but I suspect this is going to be a tough one to fix and not at all sure when it will done. It would be *really* useful to have some more real-world cases like this one to use as tests. So if you happen to have others you can send I would be quite happy to have them. Best Regards, -greg -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Canonical smiles for medium and large rings?
Dear James, On Fri, Dec 17, 2010 at 5:35 PM, James Davidson wrote: > > I have been investigating an issue that a colleague of mine identified. > He was working with the RDKit Canon Smiles node in Knime, and found that > for the natural product, Geldanamycin, the double-bond geometry > information was being lost during canonicalisation. I repeated this > result outside of knime: > > from rdkit import Chem > from rdkit.Chem import AllChem > smi = > r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C)C\C2=C(/OC)C( > =O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O' AllChem.CanonSmiles(smi) > > 'COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)C=C(C)[...@h](OC(N)=O)[C@@H]( > OC)C=CC=C(C)C(=O)NC(=CC1=O)C2=O' > > > The simpler example below may be better: > smi1 = r'O1CC/C=C\1' # cyclic ether smi2 = r'OCC/C=C\' # corresponding acyclic alcohol > AllChem.CanonSmiles(smi1) > 'C1C=CCCOCCC1' -> stereochemistry lost AllChem.CanonSmiles(smi2) > '/C=C\\CCO' -> stereochemistry retained >> > So, I am guessing that double-bonds in rings are being 'ignored'(?) by > the canonicaliser? It's actually being done by the molecule cleanup code that is run when a molecule is read. The result is, as far as you're concerned, the same though: there's no stereochemistry on ring double bonds. > For 'classic' aliphatic systems, double-bonds in > 3-7-membered rings can only sensibly exist in the cis orientation, so > 'ignoring' them would be ok. However, for 8-membered and above, cis or > trans are certainly both possible, so it becomes more important to keep > track - particularly if canonical smiles are being used to check for > unique structures, as my colleague was doing with the geldanamycin > example above. yeah, that's clear: for larger ring systems the information should be preserved. That's very easy to do. The more difficult part is going to be making sure the output is actually canonical. I've entered a bug for this (https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=160139&atid=814650) and I'll take a look to try and get it fixed (and correct). It would be helpful to have some additional test cases; I will generate some, but if you have some examples you could send (or attach to the bug report) it would be quite helpful. Thanks for the report, -greg -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Canonical smiles for medium and large rings?
Dear All, I have been investigating an issue that a colleague of mine identified. He was working with the RDKit Canon Smiles node in Knime, and found that for the natural product, Geldanamycin, the double-bond geometry information was being lost during canonicalisation. I repeated this result outside of knime: from rdkit import Chem from rdkit.Chem import AllChem >>> smi = r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C)C\C2=C(/OC)C( =O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O' >>> AllChem.CanonSmiles(smi) 'COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)C=C(C)[...@h](OC(N)=O)[C@@H]( OC)C=CC=C(C)C(=O)NC(=CC1=O)C2=O' The simpler example below may be better: >>> smi1 = r'O1CC/C=C\1' # cyclic ether >>> smi2 = r'OCC/C=C\' # corresponding acyclic alcohol >>> AllChem.CanonSmiles(smi1) 'C1C=CCCOCCC1' -> stereochemistry lost >>> AllChem.CanonSmiles(smi2) '/C=C\\CCO' -> stereochemistry retained So, I am guessing that double-bonds in rings are being 'ignored'(?) by the canonicaliser? For 'classic' aliphatic systems, double-bonds in 3-7-membered rings can only sensibly exist in the cis orientation, so 'ignoring' them would be ok. However, for 8-membered and above, cis or trans are certainly both possible, so it becomes more important to keep track - particularly if canonical smiles are being used to check for unique structures, as my colleague was doing with the geldanamycin example above. Any thoughts / suggestions are much appreciated as always! Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the "Company address and registration details" link at the bottom of the page.. __ -- Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss