Greg,

You hit the nail on the head!  I did attempt to use it for a substructure 
search with the intention of restricting valence states, but I screwed up and 
forgot the Chem.MergeQueryHs() command!

I did end up taking a different approach as described in an older RDkit blog 
post [1] using dummy atoms, since this seems to be an easier way to define 
substitutions on carbon atoms.

Either way, I’ve just started to work on my Python skills and have to say that 
I’m beginning to appreciate what a powerful tool RDkit is, so thanks for that!!

Markus

[1] http://rdkit.blogspot.com/2016/07/tuning-substructure-queries-ii.html

From: Greg Landrum <greg.land...@gmail.com>
Sent: Tuesday, November 5, 2019 11:26 PM
To: Markus Heller <mhel...@admarebio.com>
Cc: rdkit-discuss (rdkit-discuss@lists.sourceforge.net) 
<rdkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Explicit H in substructure searches

Paolo's answer was completely correct, but there's an additional point that's 
worth mentioning here.
Hs are often included in query molecules with the intent of restricting 
possible valence states of atoms, not because the user is actually interested 
in matching Hs. In this case you can use the function Chem.MergeQueryHs() to 
remove the H atoms in your query molecule and add/adjust H count queries on the 
heavy atoms they are connected to.

Here's how that works in your example:
In [6]: params = Chem.SmilesParserParams()
   ...: params.removeHs=False
   ...: query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)

In [7]: m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
   ...: m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
   ...: m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')
In [8]: m1.HasSubstructMatch(query)
Out[8]: False

In [15]: q2 = Chem.MergeQueryHs(query)

In [16]: m1.HasSubstructMatch(q2)
Out[16]: True

In [17]: m2.HasSubstructMatch(q2)
Out[17]: False

In [18]: m3.HasSubstructMatch(q2)
Out[18]: True

You can see what has happened by calling MolToSmarts:
In [19]: Chem.MolToSmarts(q2)
Out[19]: '[#6]1:[#6]:[#7]:[#7H]:[#6]:1-[#7&!H0&!H1]'

Notice that the N atom now has query features attached to it.

I hope this helps,
-greg


On Tue, Nov 5, 2019 at 7:53 PM Markus Heller 
<mhel...@admarebio.com<mailto:mhel...@admarebio.com>> wrote:
Hi,

I’m trying to understand how to properly use explicit hydrogens in substructure 
searches.  Below is an example.  I would like to find all molecules that 
contain my query with hydrogens at the nitrogens, and I thought I was on the 
right track …  Why does the first query with the explicit H not match m1?

Thanks
Markus

<code>
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import rdDepictor

rdDepictor.SetPreferCoordGen(True)
IPythonConsole.ipython_useSVG = True

m1 = Chem.MolFromSmiles('c1cn[nH]c1N')
m2 = Chem.MolFromSmiles('CNc1ccn[nH]1')
m3 = Chem.MolFromSmiles('Nc1ccnn(C)1')

# do not remove explicit H
params = Chem.SmilesParserParams()
params.removeHs=False

query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1', params)

# first should be True, but all are False
m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)

# rebuild query with explicit H removed, not what I want
query = Chem.MolFromSmiles('c1cn[nH]c(N([H])([H]))1')

m1.HasSubstructMatch(query)
m2.HasSubstructMatch(query)
m3.HasSubstructMatch(query)

</code>
--
Markus Heller, PhD
Senior Scientist
Direct: 604.827.1122   Main: 604.827.1147

 [A027228F]
2405 Wesbrook Mall, 4th Floor, Vancouver, BC V6T 1Z3

This email and any attachments thereto may contain confidential material for 
the sole use of the intended recipient. Any review, copying, or distribution of 
this email (or any attachments thereto) by others is strictly prohibited. If 
you are not the intended recipient, please contact the sender immediately and 
permanently delete the original and any copies of this email and any 
attachments thereto.

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net<mailto:Rdkit-discuss@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to