Re: [Rdkit-discuss] Lipinski HBD count
Hi Greg, Greg wrote: You actually don't need to add the Hs: p1 = Chem.MolFromSmarts('[#7,#8;H1]') p2 = Chem.MolFromSmarts('[#7,#8;H2]') p3 = Chem.MolFromSmarts('[#7,#8;H3]') m = Chem.MolFromSmiles('CC(=O)N') m2 = Chem.MolFromSmiles('OCC(=O)N') def NHOHCount(mol): return len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+ 3*len(mol.GetSubstructMatches(p3)) ... NHOHCount(m) 2 NHOHCount(m2) 3 I think this system works well in almost all cases : ) However, I had a nagging concern over a couple of 'edge' cases - namely water, and ammonia (and for that matter, the oxonium and ammonium ions). I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would make sure all cases were covered(?). Out of interest, I decided to compile a small list of 'normal' and 'edge' case SMILES, and ran it through the MOE descriptor node in KNIME. For all these cases, lip_don behaves as I would expect (tab-separated output included below) Kind regards James SMILESa_acc a_don lip_acc lip_don CO1.0 1.0 1.0 1.0 C(=O)N1.0 1.0 2.0 2.0 O 1.0 1.0 1.0 2.0 CN1.0 1.0 1.0 2.0 [O+] 1.0 0.0 1.0 3.0 C[O+] 1.0 0.0 1.0 2.0 [N+] 0.0 0.0 1.0 4.0 C[N+] 0.0 0.0 1.0 3.0 [N-] 0.0 1.0 1.0 2.0 [O-] 0.0 1.0 1.0 1.0 C(=O)[N-] 0.0 1.0 2.0 1.0 __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Lipinski HBD count
James, On Fri, Sep 30, 2011 at 8:48 AM, James Davidson j.david...@vernalis.com wrote: Greg wrote: You actually don't need to add the Hs: p1 = Chem.MolFromSmarts('[#7,#8;H1]') p2 = Chem.MolFromSmarts('[#7,#8;H2]') p3 = Chem.MolFromSmarts('[#7,#8;H3]') m = Chem.MolFromSmiles('CC(=O)N') m2 = Chem.MolFromSmiles('OCC(=O)N') def NHOHCount(mol): return len(mol.GetSubstructMatches(p1))+2*len(mol.GetSubstructMatches(p2))+ 3*len(mol.GetSubstructMatches(p3)) ... NHOHCount(m) 2 NHOHCount(m2) 3 I think this system works well in almost all cases : ) However, I had a nagging concern over a couple of 'edge' cases - namely water, and ammonia (and for that matter, the oxonium and ammonium ions). You're exactly right. I showed the SMARTS-based version as a simple illustration. The version that's actually checked in is using a different method (it loops over all O and N atoms and counts the number of Hs connected to each). I guess the simple inclusion of P4 = Chem.MolFromSmarts('[#8;H4]') would make sure all cases were covered(?). Out of interest, I decided to compile a small list of 'normal' and 'edge' case SMILES, and ran it through the MOE descriptor node in KNIME. For all these cases, lip_don behaves as I would expect (tab-separated output included below) Some comments on this below. SMILES a_acc a_don lip_acc lip_don CO 1.0 1.0 1.0 1.0 C(=O)N 1.0 1.0 2.0 2.0 O 1.0 1.0 1.0 2.0 CN 1.0 1.0 1.0 2.0 [O+] 1.0 0.0 1.0 3.0 C[O+] 1.0 0.0 1.0 2.0 [N+] 0.0 0.0 1.0 4.0 C[N+] 0.0 0.0 1.0 3.0 [N-] 0.0 1.0 1.0 2.0 [O-] 0.0 1.0 1.0 1.0 C(=O)[N-] 0.0 1.0 2.0 1.0 For what it's worth: the results here are definitely not correct for the SMILES as provided. Atoms in SMILES that are in square brackets have no implicit Hs, so [N+] actually has zero hydrogens. I guess you actually provided the molecules to MOE in some other form. Sample script using your data (with corrected SMILES): # --- from rdkit import Chem from rdkit.Chem import Lipinski d=[ [CO,1.0, 1.0, 1.0, 1.0,], [C(=O)N, 1.0, 1.0, 2.0, 2.0], [O, 1.0, 1.0, 1.0, 2.0,], [CN,1.0, 1.0, 1.0, 2.0,], [[OH3+], 1.0, 0.0, 1.0, 3.0,], [C[OH2+], 1.0, 0.0, 1.0, 2.0,], [[NH4+], 0.0, 0.0, 1.0, 4.0,], [C[NH3+], 0.0, 0.0, 1.0, 3.0,], [[NH2-], 0.0, 1.0, 1.0, 2.0,], [[OH-], 0.0, 1.0, 1.0, 1.0,], [C(=O)[NH-], 0.0, 1.0, 2.0, 1.0]] print 'Smiles NOCount NHOHCount' for row in d: m = Chem.MolFromSmiles(row[0]) hba = Lipinski.NOCount(m) hbd = Lipinski.NHOHCount(m) print row[0],hba,hbd #--- Output with the SVN version of the RDKit: #-- Smiles NOCount NHOHCount CO 1 1 C(=O)N 2 2 O 1 2 CN 1 2 [OH3+] 1 3 C[OH2+] 1 2 [NH4+] 1 4 C[NH3+] 1 3 [NH2-] 1 2 [OH-] 1 1 C(=O)[NH-] 2 1 #- Best, -greg -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Lipinski HBD count
Hi Greg, Greg wrote: For what it's worth: the results here are definitely not correct for the SMILES as provided. Atoms in SMILES that are in square brackets have no implicit Hs, so [N+] actually has zero hydrogens. I guess you actually provided the molecules to MOE in some other form. Oops - you're quite right - I converted them to MOL format with ChemAxon MolConverter. However, the point about implicit hydrogens for atoms in square brackets had completely passed me by - thanks! Output with the SVN version of the RDKit: #-- Smiles NOCount NHOHCount CO 1 1 C(=O)N 2 2 O 1 2 CN 1 2 [OH3+] 1 3 C[OH2+] 1 2 [NH4+] 1 4 C[NH3+] 1 3 [NH2-] 1 2 [OH-] 1 1 C(=O)[NH-] 2 1 #- Looks great! Kind regards James __ PLEASE READ: This email is confidential and may be privileged. It is intended for the named addressee(s) only and access to it by anyone else is unauthorised. If you are not an addressee, any disclosure or copying of the contents of this email or any action taken (or not taken) in reliance on it is unauthorised and may be unlawful. If you have received this email in error, please notify the sender or postmas...@vernalis.com. Email is not a secure method of communication and the Company cannot accept responsibility for the accuracy or completeness of this message or any attachment(s). Please check this email for virus infection for which the Company accepts no responsibility. If verification of this email is sought then please request a hard copy. Unless otherwise stated, any views or opinions presented are solely those of the author and do not represent those of the Company. The Vernalis Group of Companies Oakdene Court 613 Reading Road Winnersh, Berkshire RG41 5UA. Tel: +44 118 977 3133 To access trading company registration and address details, please go to the Vernalis website at www.vernalis.com and click on the Company address and registration details link at the bottom of the page.. __ -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] KNIME - Treatment of H in 2.0.0.1061 nodes
So the 2.0.0.1088 nodes now generate 636 matches and only 2 false positives: WEHI-0054407S=C1N(C(=C(C2=C1CN(C(C2)(C)C)C)C#N)N)C [#6](-[#1])(-[#1])-[#7]([#6]:[#6])~[#6][#6]=,:[#6]-[#6]~[#6][#7] dyes5A(27) WEHI-0063070N1C(=NC=C(C1=O)C)NN=Cc2ccc(cc2)N(C)C [#6](-[#1])(-[#1])-[#7](-[#6](-[#1])-[#1])-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[#6](-[#1])=[#7]-[#7]-[$([#6](=[#8])-[#6](-[#1])(-[#1])-[#16]-[#6]:[#7]),$([#6](=[#8])-[#6](-[#1])(-[#1])-[!#1]:[!#1]:[#7]),$([#6](=[#8])-[#6]:[#6]-[#8]-[#1]),$([#6]:[#7]),$([#6](-[#1])(-[#1])-[#6](-[#1])-[#8]-[#1])])-[#1])-[#1] hzone_anil_di_alk(35) -- Cheers, Simon -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] KNIME - Treatment of H in 2.0.0.1061 nodes
Simon, On Fri, Sep 30, 2011 at 1:11 PM, Simon Saubern simon.saub...@csiro.au wrote: So the 2.0.0.1088 nodes now generate 636 matches and only 2 false positives: Now that's progress! :-) WEHI-0054407 S=C1N(C(=C(C2=C1CN(C(C2)(C)C)C)C#N)N)C [#6](-[#1])(-[#1])-[#7]([#6]:[#6])~[#6][#6]=,:[#6]-[#6]~[#6][#7] dyes5A(27) WEHI-0063070 N1C(=NC=C(C1=O)C)NN=Cc2ccc(cc2)N(C)C [#6](-[#1])(-[#1])-[#7](-[#6](-[#1])-[#1])-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[#6](-[#1])=[#7]-[#7]-[$([#6](=[#8])-[#6](-[#1])(-[#1])-[#16]-[#6]:[#7]),$([#6](=[#8])-[#6](-[#1])(-[#1])-[!#1]:[!#1]:[#7]),$([#6](=[#8])-[#6]:[#6]-[#8]-[#1]),$([#6]:[#7]),$([#6](-[#1])(-[#1])-[#6](-[#1])-[#8]-[#1])])-[#1])-[#1] hzone_anil_di_alk(35) I don't think that either of those are false positives; it looks like there should be a match for each. I guess the differences you see between cheminformatics systems have to do with aromaticity definitions. -greg -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss