Re: [Rdkit-discuss] Canonical smiles for medium and large rings?

2010-12-17 Thread Greg Landrum
Dear James,

On Fri, Dec 17, 2010 at 5:35 PM, James Davidson  wrote:
>
> I have been investigating an issue that a colleague of mine identified.
> He was working with the RDKit Canon Smiles node in Knime, and found that
> for the natural product, Geldanamycin, the double-bond geometry
> information was being lost during canonicalisation.  I repeated this
> result outside of knime:
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
>
 smi =
> r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C)C\C2=C(/OC)C(
> =O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O'
 AllChem.CanonSmiles(smi)
>
> 'COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)C=C(C)[...@h](OC(N)=O)[C@@H](
> OC)C=CC=C(C)C(=O)NC(=CC1=O)C2=O'
>
>
> The simpler example below may be better:
>
 smi1 = r'O1CC/C=C\1' # cyclic ether
 smi2 = r'OCC/C=C\' # corresponding acyclic alcohol
>
 AllChem.CanonSmiles(smi1)
> 'C1C=CCCOCCC1' -> stereochemistry lost
 AllChem.CanonSmiles(smi2)
> '/C=C\\CCO' -> stereochemistry retained
>>
> So, I am guessing that double-bonds in rings are being 'ignored'(?) by
> the canonicaliser?

It's actually being done by the molecule cleanup code that is run when
a molecule is read. The result is, as far as you're concerned, the
same though: there's no stereochemistry on ring double bonds.

>  For 'classic' aliphatic systems, double-bonds in
> 3-7-membered rings can only sensibly exist in the cis orientation, so
> 'ignoring' them would be ok.  However, for 8-membered and above, cis or
> trans are certainly both possible, so it becomes more important to keep
> track - particularly if canonical smiles are being used to check for
> unique structures, as my colleague was doing with the geldanamycin
> example above.

yeah, that's clear: for larger ring systems the information should be
preserved. That's very easy to do. The more difficult part is going to
be making sure the output is actually canonical. I've entered a bug
for this 
(https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=160139&atid=814650)
and I'll take a look to try and get it fixed (and correct).

It would be helpful to have some additional test cases; I will
generate some, but if you have some examples you could send (or attach
to the bug report) it would be quite helpful.

Thanks for the report,
-greg

--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Canonical smiles for medium and large rings?

2010-12-17 Thread James Davidson
Dear All,
 
I have been investigating an issue that a colleague of mine identified.
He was working with the RDKit Canon Smiles node in Knime, and found that
for the natural product, Geldanamycin, the double-bond geometry
information was being lost during canonicalisation.  I repeated this
result outside of knime:
 
from rdkit import Chem
from rdkit.Chem import AllChem

>>> smi =
r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C)C\C2=C(/OC)C(
=O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O'
>>> AllChem.CanonSmiles(smi)

'COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)C=C(C)[...@h](OC(N)=O)[C@@H](
OC)C=CC=C(C)C(=O)NC(=CC1=O)C2=O'


The simpler example below may be better:

>>> smi1 = r'O1CC/C=C\1' # cyclic ether
>>> smi2 = r'OCC/C=C\' # corresponding acyclic alcohol

>>> AllChem.CanonSmiles(smi1)
'C1C=CCCOCCC1' -> stereochemistry lost
>>> AllChem.CanonSmiles(smi2)
'/C=C\\CCO' -> stereochemistry retained


So, I am guessing that double-bonds in rings are being 'ignored'(?) by
the canonicaliser?  For 'classic' aliphatic systems, double-bonds in
3-7-membered rings can only sensibly exist in the cis orientation, so
'ignoring' them would be ok.  However, for 8-membered and above, cis or
trans are certainly both possible, so it becomes more important to keep
track - particularly if canonical smiles are being used to check for
unique structures, as my colleague was doing with the geldanamycin
example above.
 
Any thoughts / suggestions are much appreciated as always!

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss