Re: [Rdkit-discuss] A couple of questions regarding ReactionFromSmarts

2010-06-03 Thread Greg Landrum
Dear James,

On Thu, Jun 3, 2010 at 7:51 PM, James Davidson  wrote:
>
> First of all, I'd like to start by saying how much I've been enjoying
> exploring the functionality of RDKit - great job, Greg!

Thanks!

> I have a couple of questions regarding
> 'rdkit.Chem.AllChem.ReactionFromSmarts':
>
> (1)  I see that the reaction objects can be created from MDL Reaction
> Files/Blocks - is there a way to do the reverse, and save a reaction object
> in MDL .rxn format?  I tried using investigating the rxn.ToBinary()
> attribute, but didn't get very far...  The reason I wanted to do this, is
> that I was trying to figure-out how to generate a form of the reaction
> object (generated from reaction SMARTS) that was suitable for converting
> into a 2D depiction of the transformation.

At the moment the reactions are essentially input-only. There's really
no way to get them out in any format that could be used elsewhere.
This is a sadly missing feature: it would be really nice to be able to
generate either .rxn files (or at least reaction smarts) from
reactions. I will add a feature request for this, but it may take a
while to happen.[1]

A workaround that kind of works is to paste the reaction smarts into
something like Marvin Sketch. It will normally display something that
at least gives some idea of what the reaction is.

> (2) I know that reaction SMARTS isn't SMIRKS, but I have noticed some
> behaviour that I would not expect - however, this could be down to my
> SMARTS-naivety; my SMIRKS-naivety; or both!

Anytime reactions behave in ways you don't expect, it's probably best
to just blame me for coming up with yet another way of expressing them
that is slightly incompatible with the existing ones. :-)

> I initially tried the
> following:
>
> from rdkit import Chem
> from rdkit.Chem import AllChem
> rxn_smarts =
> '[!#1:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[!#1:1]-[C:3](=[O:4])-[NH:2]-[C,c:5]'
> sm = Chem.MolFromSmiles('CC(=O)NC')
> rxn = AllChem.ReactionFromSmarts(rxn_smarts)
> prods = rxn.RunReactants((sm,))
> prod = Chem.MolToSmiles(prod[0][0])
>
>
> This gives me prod = '[H]C(=O)NC'

There's a discussion of this kind of case in the "RDKit Book"
($RDBASE/Docs/Book/RDKit_Book.pdf) starting on page 3. The short
answer is that if you have a query feature (atom list, recursive
smarts, etc.) in the reactants and you would like the matching atom to
be copied into the products you should include a dummy for that atom
in the products. A working form of your example is then:

[11]>>> rxn_smarts =
'[!#1:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[*:1]-[C:3](=[O:4])-[NH:2]-[*:5]'

[12]>>> rxn = AllChem.ReactionFromSmarts(rxn_smarts)

[13]>>> prods = rxn.RunReactants((Chem.MolFromSmiles('c1c1C(=O)NCC1CC1'),))

[14]>>> Chem.MolToSmiles(prods[0][0])
Out[14] 'O=C(CC1CC1)Nc1c1'

As an aside, in SMARTS it's shorter (and I think clearer) to write
[C,c] as [#6]. It also produces a query that runs a bit quicker, but
you probably won't notice that difference in most cases.

> If I replace with < '[!H:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[!H:1]-[C:3](=[O:4])-[NH:2]-[C,c:5]'>>,
> I get the behaviour I want - with prod = 'CNC(=O)C'.  So I think I can get
> the behaviour I want, but was curious if I am using the SMARTS ! operator
> incorrectly in conjunction with atomic numbers, or whether this may be a
> bug?

Not really a bug. The behavior when you have queries in the products
is undocumented: depending on the details of the query it will
sometimes do the right thing, sometimes not. It's much safer to just
use "*". What I probably should do is add a warning message if the
reaction contains a query in the products, I will think about this.

Best Regards,
-greg

[1] The underlying problem isn't actually generating the rxn files
themselves, they are just a collection of mol blocks with a bit of
extra verbiage sprinkled around. The problem is generating reasonable
mol blocks for molecules with query features. I already have a feature
request in for that one
(http://sourceforge.net/tracker/?group_id=160139&atid=814653), but it
turns out to not be quite as easy as it sounds.

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] A couple of questions regarding ReactionFromSmarts

2010-06-03 Thread James Davidson
Hi,
 
First of all, I'd like to start by saying how much I've been enjoying
exploring the functionality of RDKit - great job, Greg!
I have a couple of questions regarding
'rdkit.Chem.AllChem.ReactionFromSmarts':
 
(1)  I see that the reaction objects can be created from MDL Reaction
Files/Blocks - is there a way to do the reverse, and save a reaction
object in MDL .rxn format?  I tried using investigating the
rxn.ToBinary() attribute, but didn't get very far...  The reason I
wanted to do this, is that I was trying to figure-out how to generate a
form of the reaction object (generated from reaction SMARTS) that was
suitable for converting into a 2D depiction of the transformation.
 
(2) I know that reaction SMARTS isn't SMIRKS, but I have noticed some
behaviour that I would not expect - however, this could be down to my
SMARTS-naivety; my SMIRKS-naivety; or both!  I initially tried the
following:
 
from rdkit import Chem
from rdkit.Chem import AllChem
rxn_smarts =
'[!#1:1]-[NH:2]-[C:3](=[O:4])-[C,c:5]>>[!#1:1]-[C:3](=[O:4])-[NH:2]-[C,c
:5]'
sm = Chem.MolFromSmiles('CC(=O)NC')
rxn = AllChem.ReactionFromSmarts(rxn_smarts)
prods = rxn.RunReactants((sm,))
prod = Chem.MolToSmiles(prod[0][0])
 
 
This gives me prod = '[H]C(=O)NC'
 
If I replace with <>[!H:1]-[C:3](=[O:4])-[NH:2]-[C,c:5
]'>>, I get the behaviour I want - with prod = 'CNC(=O)C'.  So I think I
can get the behaviour I want, but was curious if I am using the SMARTS !
operator incorrectly in conjunction with atomic numbers, or whether this
may be a bug?
 
Kind regards
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss