Re: [Rdkit-discuss] Lipinski HBD count

Greg Landrum Fri, 30 Sep 2011 00:17:04 -0700

James,

On Fri, Sep 30, 2011 at 8:48 AM, James Davidson <j.david...@vernalis.com> wrote:
>
> Greg wrote:
>> You actually don't need to add the Hs:
>> >>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
>> >>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
>> >>> p3 = Chem.MolFromSmarts('[#7,#8;H3]') m =
>> >>> Chem.MolFromSmiles('CC(=O)N')
>> >>> m2 = Chem.MolFromSmiles('OCC(=O)N')
>> >>> def NHOHCount(mol): return
>> >>>
>> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+
>> >>> 3*len(mol.GetSubstructMatches(p3))
>> ...
>> >>> NHOHCount(m)
>> 2
>> >>> NHOHCount(m2)
>> 3
>
> I think this system works well in almost all cases : )  However, I had a
> nagging concern over a couple of 'edge' cases - namely water, and
> ammonia (and for that matter, the oxonium and ammonium ions).


You're exactly right. I showed the SMARTS-based version as a simple
illustration. The version that's actually checked in is using a
different method (it loops over all O and N atoms and counts the
number of Hs connected to each).

> I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would
> make sure all cases were covered(?).
>
> Out of interest, I decided to compile a small list of 'normal' and
> 'edge' case SMILES, and ran it through the MOE descriptor node in KNIME.
> For all these cases, lip_don behaves as I would expect (tab-separated
> output included below)

Some comments on this below.

>
> "SMILES"        "a_acc" "a_don" "lip_acc"       "lip_don"
> "CO"    1.0     1.0     1.0     1.0
> "C(=O)N"        1.0     1.0     2.0     2.0
> "O"     1.0     1.0     1.0     2.0
> "CN"    1.0     1.0     1.0     2.0
> "[O+]"  1.0     0.0     1.0     3.0
> "C[O+]" 1.0     0.0     1.0     2.0
> "[N+]"  0.0     0.0     1.0     4.0
> "C[N+]" 0.0     0.0     1.0     3.0
> "[N-]"  0.0     1.0     1.0     2.0
> "[O-]"  0.0     1.0     1.0     1.0
> "C(=O)[N-]"     0.0     1.0     2.0     1.0

For what it's worth: the results here are definitely not correct for
the SMILES as provided. Atoms in SMILES that are in square brackets
have no implicit Hs, so [N+] actually has zero hydrogens. I guess you
actually provided the molecules to MOE in some other form.

Sample script using your data (with corrected SMILES):
# -------------------
from rdkit import Chem
from rdkit.Chem import Lipinski

d=[
["CO",    1.0,     1.0,     1.0,     1.0,],
["C(=O)N",         1.0,     1.0,     2.0,     2.0],
["O",     1.0,     1.0,     1.0,     2.0,],
["CN",    1.0,     1.0,     1.0,     2.0,],
["[OH3+]",  1.0,     0.0,     1.0,     3.0,],
["C[OH2+]", 1.0,     0.0,     1.0,     2.0,],
["[NH4+]",  0.0,     0.0,     1.0,     4.0,],
["C[NH3+]", 0.0,     0.0,     1.0,     3.0,],
["[NH2-]",  0.0,     1.0,     1.0,     2.0,],
["[OH-]",  0.0,     1.0,     1.0,     1.0,],
["C(=O)[NH-]",      0.0,     1.0,     2.0,     1.0]]

print 'Smiles NOCount NHOHCount'
for row in d:
    m = Chem.MolFromSmiles(row[0])
    hba = Lipinski.NOCount(m)
    hbd = Lipinski.NHOHCount(m)
    print row[0],hba,hbd
#-----------------------------------

Output with the SVN version of the RDKit:

#------------------
Smiles NOCount NHOHCount
CO 1 1
C(=O)N 2 2
O 1 2
CN 1 2
[OH3+] 1 3
C[OH2+] 1 2
[NH4+] 1 4
C[NH3+] 1 3
[NH2-] 1 2
[OH-] 1 1
C(=O)[NH-] 2 1
#-----------------


Best,
-greg

------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Lipinski HBD count

Reply via email to