Re: [Rdkit-discuss] Lipinski HBD count

2011-09-30 Thread James Davidson
Hi Greg,

Greg wrote: 
> For what it's worth: the results here are definitely not 
> correct for the SMILES as provided. Atoms in SMILES that are 
> in square brackets have no implicit Hs, so [N+] actually has 
> zero hydrogens. I guess you actually provided the molecules 
> to MOE in some other form.

Oops - you're quite right - I converted them to MOL format with ChemAxon
MolConverter.  However, the point about implicit hydrogens for atoms in
square brackets had completely passed me by - thanks!

> Output with the SVN version of the RDKit:
> 
> #--
> Smiles NOCount NHOHCount
> CO 1 1
> C(=O)N 2 2
> O 1 2
> CN 1 2
> [OH3+] 1 3
> C[OH2+] 1 2
> [NH4+] 1 4
> C[NH3+] 1 3
> [NH2-] 1 2
> [OH-] 1 1
> C(=O)[NH-] 2 1
> #-


Looks great!

Kind regards

James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Lipinski HBD count

2011-09-30 Thread Greg Landrum
James,

On Fri, Sep 30, 2011 at 8:48 AM, James Davidson  wrote:
>
> Greg wrote:
>> You actually don't need to add the Hs:
>> >>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
>> >>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
>> >>> p3 = Chem.MolFromSmarts('[#7,#8;H3]') m =
>> >>> Chem.MolFromSmiles('CC(=O)N')
>> >>> m2 = Chem.MolFromSmiles('OCC(=O)N')
>> >>> def NHOHCount(mol): return
>> >>>
>> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+
>> >>> 3*len(mol.GetSubstructMatches(p3))
>> ...
>> >>> NHOHCount(m)
>> 2
>> >>> NHOHCount(m2)
>> 3
>
> I think this system works well in almost all cases : )  However, I had a
> nagging concern over a couple of 'edge' cases - namely water, and
> ammonia (and for that matter, the oxonium and ammonium ions).

You're exactly right. I showed the SMARTS-based version as a simple
illustration. The version that's actually checked in is using a
different method (it loops over all O and N atoms and counts the
number of Hs connected to each).

> I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would
> make sure all cases were covered(?).
>
> Out of interest, I decided to compile a small list of 'normal' and
> 'edge' case SMILES, and ran it through the MOE descriptor node in KNIME.
> For all these cases, lip_don behaves as I would expect (tab-separated
> output included below)

Some comments on this below.

>
> "SMILES"        "a_acc" "a_don" "lip_acc"       "lip_don"
> "CO"    1.0     1.0     1.0     1.0
> "C(=O)N"        1.0     1.0     2.0     2.0
> "O"     1.0     1.0     1.0     2.0
> "CN"    1.0     1.0     1.0     2.0
> "[O+]"  1.0     0.0     1.0     3.0
> "C[O+]" 1.0     0.0     1.0     2.0
> "[N+]"  0.0     0.0     1.0     4.0
> "C[N+]" 0.0     0.0     1.0     3.0
> "[N-]"  0.0     1.0     1.0     2.0
> "[O-]"  0.0     1.0     1.0     1.0
> "C(=O)[N-]"     0.0     1.0     2.0     1.0

For what it's worth: the results here are definitely not correct for
the SMILES as provided. Atoms in SMILES that are in square brackets
have no implicit Hs, so [N+] actually has zero hydrogens. I guess you
actually provided the molecules to MOE in some other form.

Sample script using your data (with corrected SMILES):
# ---
from rdkit import Chem
from rdkit.Chem import Lipinski

d=[
["CO",1.0, 1.0, 1.0, 1.0,],
["C(=O)N", 1.0, 1.0, 2.0, 2.0],
["O", 1.0, 1.0, 1.0, 2.0,],
["CN",1.0, 1.0, 1.0, 2.0,],
["[OH3+]",  1.0, 0.0, 1.0, 3.0,],
["C[OH2+]", 1.0, 0.0, 1.0, 2.0,],
["[NH4+]",  0.0, 0.0, 1.0, 4.0,],
["C[NH3+]", 0.0, 0.0, 1.0, 3.0,],
["[NH2-]",  0.0, 1.0, 1.0, 2.0,],
["[OH-]",  0.0, 1.0, 1.0, 1.0,],
["C(=O)[NH-]",  0.0, 1.0, 2.0, 1.0]]

print 'Smiles NOCount NHOHCount'
for row in d:
m = Chem.MolFromSmiles(row[0])
hba = Lipinski.NOCount(m)
hbd = Lipinski.NHOHCount(m)
print row[0],hba,hbd
#---

Output with the SVN version of the RDKit:

#--
Smiles NOCount NHOHCount
CO 1 1
C(=O)N 2 2
O 1 2
CN 1 2
[OH3+] 1 3
C[OH2+] 1 2
[NH4+] 1 4
C[NH3+] 1 3
[NH2-] 1 2
[OH-] 1 1
C(=O)[NH-] 2 1
#-


Best,
-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Lipinski HBD count

2011-09-29 Thread James Davidson
Hi Greg,

Greg wrote: 
> You actually don't need to add the Hs:
> >>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
> >>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
> >>> p3 = Chem.MolFromSmarts('[#7,#8;H3]') m = 
> >>> Chem.MolFromSmiles('CC(=O)N')
> >>> m2 = Chem.MolFromSmiles('OCC(=O)N')
> >>> def NHOHCount(mol): return 
> >>> 
> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+
> >>> 3*len(mol.GetSubstructMatches(p3))
> ...
> >>> NHOHCount(m)
> 2
> >>> NHOHCount(m2)
> 3

I think this system works well in almost all cases : )  However, I had a
nagging concern over a couple of 'edge' cases - namely water, and
ammonia (and for that matter, the oxonium and ammonium ions).

I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would
make sure all cases were covered(?).

Out of interest, I decided to compile a small list of 'normal' and
'edge' case SMILES, and ran it through the MOE descriptor node in KNIME.
For all these cases, lip_don behaves as I would expect (tab-separated
output included below)

Kind regards

James

"SMILES""a_acc" "a_don" "lip_acc"   "lip_don"
"CO"1.0 1.0 1.0 1.0
"C(=O)N"1.0 1.0 2.0 2.0
"O" 1.0 1.0 1.0 2.0
"CN"1.0 1.0 1.0 2.0
"[O+]"  1.0 0.0 1.0 3.0
"C[O+]" 1.0 0.0 1.0 2.0
"[N+]"  0.0 0.0 1.0 4.0
"C[N+]" 0.0 0.0 1.0 3.0
"[N-]"  0.0 1.0 1.0 2.0
"[O-]"  0.0 1.0 1.0 1.0
"C(=O)[N-]" 0.0 1.0 2.0 1.0

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Lipinski HBD count

2011-09-29 Thread Greg Landrum
On Fri, Sep 30, 2011 at 4:44 AM, Greg Landrum  wrote:
>
>> The MOE descriptor lip_don seems to exactly reproduce these 'bond count'
>> numbers for my set of compounds.  So I guess my question is - shouldn't
>> we be counting the NH and OH bonds for Lipinski-like counting? (and I
>> guess this is what MOE's lip_don is for)
>
> I think we should be, yes. I believe that this is a bug in the current
> Lipinski.NHOHCount() function and I will go ahead and fix it. Thanks
> for pointing it out.

The bug report:
https://sourceforge.net/tracker/?func=detail&aid=3415534&group_id=160139&atid=814650
the fix is now checked in. It will be in the next RDKit release (coming soon).

With the fix I also bumped the version number of that descriptor to 2.0.0.

-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Lipinski HBD count

2011-09-29 Thread Greg Landrum
Dear James,

On Wed, Sep 28, 2011 at 8:04 PM, James Davidson  wrote:
>
> Apologies for posting on a rather well-trodden (and tedious?) topic...
> I have just spent some time counting H-bond donors in a variety of ways
> for a 4000 compound data set - to see how our calculations could best
> match the results coming from a collaborator (it's as fun as it sounds).

Yeah that sounds like a blast. :-S

>
> As it turns-out, Descriptors.NumHDonors is pretty much on the money! (ie
> counting the number or N and O atoms that have at least one H attached)
> However, during this process I ended-up going back to a (the?) Lipinski
> paper -
>
> Christopher A Lipinski, Franco Lombardo, Beryl W Dominy, Paul J Feeney,
> Experimental and computational approaches to estimate solubility and
> permeability in drug discovery and development settings, Advanced Drug
> Delivery Reviews, Volume 46, Issues 1-3, 1 March 2001, Pages 3-26, ISSN
> 0169-409X, 10.1016/S0169-409X(00)00129-0.
>
> - to see what the Lipinski definition of Hydrogen Bond Donors was.  I
> read the following:
>
> "We found that simply adding the number of NH bonds and OH bonds does
> remarkably well as an index of H bond donor character. Importantly, this
> parameter has direct structural relevance to the chemist."

Interesting. Thanks for actually reading the paper.

> As far as I can tell, this would require explicit addition of Hs to the
> molecule, followed by counting the number of matches for an NH or OH
> BOND; something like the following:
>
 from rdkit import Chem
 from rdkit.Chem import Descriptors
>
 smarts = Chem.MolFromSmarts("[#7,#8]-[#1]")
 mol = Chem.MolFromSmiles("CC(=O)N")
 mol = Chem.AddHs(mol)
 matches = mol.GetSubstructMatches(smarts)
 print len(matches)
> 2

You actually don't need to add the Hs:
>>> p1 = Chem.MolFromSmarts('[#7,#8;H1]')
>>> p2 = Chem.MolFromSmarts('[#7,#8;H2]')
>>> p3 = Chem.MolFromSmarts('[#7,#8;H3]')
>>> m = Chem.MolFromSmiles('CC(=O)N')
>>> m2 = Chem.MolFromSmiles('OCC(=O)N')
>>> def NHOHCount(mol): return 
>>> len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+3*len(mol.GetSubstructMatches(p3))
...
>>> NHOHCount(m)
2
>>> NHOHCount(m2)
3

> The MOE descriptor lip_don seems to exactly reproduce these 'bond count'
> numbers for my set of compounds.  So I guess my question is - shouldn't
> we be counting the NH and OH bonds for Lipinski-like counting? (and I
> guess this is what MOE's lip_don is for)

I think we should be, yes. I believe that this is a bug in the current
Lipinski.NHOHCount() function and I will go ahead and fix it. Thanks
for pointing it out.

Best,
-greg

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2dcopy2
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss