Re: [Rdkit-discuss] Canonical smiles for medium and large rings?

2011-01-04 Thread James Davidson
Hi Greg,

> On Sat, Dec 18, 2010 at 6:27 AM, Greg Landrum 
>  wrote:
> 
> I just checked in a set of changes that should get this 
> (mostly) working correctly. Here's a demonstration with Geldanamycin:
> 
> In [7]: 
> smi=r'NC(=O)o...@h]1c(/C)=C/[...@h](C)[C@@H](O)[C@@H](OC)c...@h](C
> )C\C2=C(/OC)C(=O)\C=C(\NC(=O)C(\C)=C\C=C/[C@@H]1OC)C2=O'
> 
> In [8]: print Chem.CanonSmiles(smi)
> COC1=C2C[C@@H](C)c...@h](OC)[...@h](O)[C@@H](C)/C=C(\C)[...@h](OC(N
> )=O)[C@@H](OC)/C=C\C=C(/C)C(=O)NC(=CC1=O)C2=O

Thanks for looking into this so quickly!

> It would be *really* useful to have some more real-world 
> cases like this one to use as tests. So if you happen to have 
> others you can send I would be quite happy to have them.

On that note, I have added a comment to the bug tracker
(https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=16013
9&atid=814650) - but was not sure how to attach a file (eg sdf) there,
so apologies for it ending up on more lines than I intended...  Also, I
logged in with my google account, but it looks like it may not be clear
who it is!

The first two examples are two marine natural products that only differ
in the geometry of the double bond in the medium ring.  The final
example is a cis- analogue that I synthesised during my PhD for which a
crystal structure was also obtained.  The stereochemistry in these
systems is 'challenging' to say the least, so I thought they would make
reasonable test cases.  I should say that even for the cis- double bond
cases, RDKit does a rather ugly job of the 2D depiction - but I am not
sure if other depictors will perform much better...

On a related note, I was keen to manually double-check the
stereochemistry that had been assigned to each of the chiral centres
(particularly the ones involving the 9-5 ring connections - as these are
potentially troublesome), and found myself wishing there was a way to
easily label a 2D depiction of the molecules with the atom ID.  What I
ended-up doing was the following:

1.  Getting the R/S info + atomIdx back from RDKit (example output):
>>> Chem.FindMolChiralCenters(mol)
[(3, 'R'), (7, 'R'), (8, 'S'), (9, 'R'), (11, 'R'), (18, 'R'), (24,
'R')]
2.  Opening the molfile in a program where I know how to label with atom
IDs (pymol)
3.  Check which atom is which manually (had to add 1 to the RDKit
atomIdx values as they start at 0) then double-check with reference
values.

RDKit performed admirably - but I presume this is dependant on the
quality of the wedge info coming in from the SDF(?)

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Canonical smiles for medium and large rings?

2011-01-04 Thread Greg Landrum
James,

On Tue, Jan 4, 2011 at 6:29 PM, James Davidson  wrote:
>
>> It would be *really* useful to have some more real-world
>> cases like this one to use as tests. So if you happen to have
>> others you can send I would be quite happy to have them.
>
> On that note, I have added a comment to the bug tracker
> (https://sourceforge.net/tracker/?func=detail&aid=3139534&group_id=16013
> 9&atid=814650) - but was not sure how to attach a file (eg sdf) there,
> so apologies for it ending up on more lines than I intended...  Also, I
> logged in with my google account, but it looks like it may not be clear
> who it is!

Thanks for these. I just added a couple of initial tests based on
them. I will try to find the time to make them a bit more
comprehensive in the next couple of days.

> The first two examples are two marine natural products that only differ
> in the geometry of the double bond in the medium ring.  The final
> example is a cis- analogue that I synthesised during my PhD for which a
> crystal structure was also obtained.  The stereochemistry in these
> systems is 'challenging' to say the least, so I thought they would make
> reasonable test cases.  I should say that even for the cis- double bond
> cases, RDKit does a rather ugly job of the 2D depiction - but I am not
> sure if other depictors will perform much better...

Yeah, I'm afraid it's not going to do  a reasonable job with the
depiction of natural products. Most depictors (including many human
ones) have trouble getting these rendered well.

> On a related note, I was keen to manually double-check the
> stereochemistry that had been assigned to each of the chiral centres
> (particularly the ones involving the 9-5 ring connections - as these are
> potentially troublesome), and found myself wishing there was a way to
> easily label a 2D depiction of the molecules with the atom ID.  What I
> ended-up doing was the following:
>
> 1.  Getting the R/S info + atomIdx back from RDKit (example output):
 Chem.FindMolChiralCenters(mol)
> [(3, 'R'), (7, 'R'), (8, 'S'), (9, 'R'), (11, 'R'), (18, 'R'), (24,
> 'R')]
> 2.  Opening the molfile in a program where I know how to label with atom
> IDs (pymol)
> 3.  Check which atom is which manually (had to add 1 to the RDKit
> atomIdx values as they start at 0) then double-check with reference
> values.
>
> RDKit performed admirably - but I presume this is dependant on the
> quality of the wedge info coming in from the SDF(?)

If the data are read from an SDF, yes: the initial stereochem
information comes from the SDF. If you have a 3D SD file, you can also
have the RDKit ignore bond wedging and assign chirality based purely
on coordinates.
R/S assignments are done in a later step; it's always nice to hear
that those are correct.

for what it's worth: I tend to use Marvin Sketch for the "drawing
molecules with atom indices to check up on stereochemistry" task. It
will also assign absolute stereochem to atoms and bonds (usually
correctly), so it's a useful check there too.

Best regards,
-greg

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss