[Rdkit-discuss] More helpful error messages...
Using RDKit 2010.10 Some error messages need to be more helpful. For e.g. in a 10,000 molecule smiles file: [23:07:01] Can't kekulize mol Traceback (most recent call last): File ./x.py, line 314, in module main() File ./x.py, line 303, in main mols = doSomething(...) File ./x.py, line 185, in doSomething mol_noH = Chem.RemoveHs(mol_h) ValueError: Sanitization error: Can't kekulize mol Or a Cannot parse smiles error give an indication of what is going on -- but I need to know which mol they are failing at... Something like [23:07:01] Can't kekulize mol (some id and/or textual smiles) [23:07:01] Cannot parse smiles string (Ccc1XXXc) Would be more helpful... Jean-Paul Ebejer Early Stage Researcher InhibOx Ltd Pembroke House 36-37 Pembroke Street Oxford OX1 1BP UK (+44 / 0) 1865 262 034 This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised dissemination or copying of this email or its attachments, and any use or disclosure of any information contained in them, is strictly prohibited and may be illegal. If you have received this email in error please notify the sender and delete all copies from your system. We and our group companies accept no liability or responsibility for personal emails or emails unconnected with our business. Internet communications including emails and access and use of web sites cannot be guaranteed to be secure or error free as information can be intercepted, corrupted, lost or arrive late. Furthermore, while we have taken steps to control the spread of viruses on our systems, we cannot guarantee that this email and any files transmitted with it are virus free. No liability is accepted for any errors, omissions, interceptions, corrupted mail, lost communications or late delivery arising as a result of receiving this message via the Internet or for any virus that may be contained in it. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Is it possible?
Dearest Greg, Are your slides about the RDkit DB cartridge talk in Cambridge publicly available? Cheers Jean-Paul Ebejer Early Stage Researcher InhibOx Ltd Pembroke House 36-37 Pembroke Street Oxford OX1 1BP UK (+44 / 0) 1865 262 034 This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised dissemination or copying of this email or its attachments, and any use or disclosure of any information contained in them, is strictly prohibited and may be illegal. If you have received this email in error please notify the sender and delete all copies from your system. We and our group companies accept no liability or responsibility for personal emails or emails unconnected with our business. Internet communications including emails and access and use of web sites cannot be guaranteed to be secure or error free as information can be intercepted, corrupted, lost or arrive late. Furthermore, while we have taken steps to control the spread of viruses on our systems, we cannot guarantee that this email and any files transmitted with it are virus free. No liability is accepted for any errors, omissions, interceptions, corrupted mail, lost communications or late delivery arising as a result of receiving this message via the Internet or for any virus that may be contained in it. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] PDB processing
Dear folks, is there a way to add occupancy B-factors (e.g. 1.00 50.0) to a PDB file? Thanks Cheers, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Typo in wiki [DatabaseCreation2]
Hi Greg, there is a typo on the page describing the creation of a database using eMolecules: sed in the form 's/\\//' to escape backslashes will only replace the first occurrence of the match, the 'g' at the end is necessary to replace all occurrences on a line. grep '6172136' eMolecules-2011-01-02.smi | sed 's/\\//' C/C(=C\\c1c1)/C=C\1/N=C(OC1=O)c1ccc(cc1)[N+](=O)[O-] 6172136 grep '6172136' eMolecules-2011-01-02.smi | sed 's/\\//g' C/C(=C\\c1c1)/C=C\\1/N=C(OC1=O)c1ccc(cc1)[N+](=O)[O-] 6172136 Cheers, Adrian -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] random forest in RDKit - ctd.
Dear Greg, the Wiki is a great place to start right from scratch with the RDKit ML capabilities! However, I wonder how to build a 3-class model: for i,m in enumerate(ms): if m.GetProp('ACTIVITY_CLASS')=='active': act=1 else: act=0 pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act]) Naively, I just tried act 0,1 or 2 - but this did not work. Cheers Thanks, Paul P.S.: Although I'm rather to RDKit, I'm willing to continue the Wiki-Entry regarding the ML capabilities. But someone should cross-check, if I don't add any errors.. This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://disclaimer.merck.de to access the German, French, Spanish and Portuguese versions of this disclaimer. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Is it possible?
Hi JP, They will be publicly available. I'll send a link when they are. -greg On Mon, May 9, 2011 at 12:20 PM, JP jeanpaul.ebe...@inhibox.com wrote: Dearest Greg, Are your slides about the RDkit DB cartridge talk in Cambridge publicly available? Cheers Jean-Paul Ebejer Early Stage Researcher InhibOx Ltd Pembroke House 36-37 Pembroke Street Oxford OX1 1BP UK (+44 / 0) 1865 262 034 This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. Any unauthorised dissemination or copying of this email or its attachments, and any use or disclosure of any information contained in them, is strictly prohibited and may be illegal. If you have received this email in error please notify the sender and delete all copies from your system. We and our group companies accept no liability or responsibility for personal emails or emails unconnected with our business. Internet communications including emails and access and use of web sites cannot be guaranteed to be secure or error free as information can be intercepted, corrupted, lost or arrive late. Furthermore, while we have taken steps to control the spread of viruses on our systems, we cannot guarantee that this email and any files transmitted with it are virus free. No liability is accepted for any errors, omissions, interceptions, corrupted mail, lost communications or late delivery arising as a result of receiving this message via the Internet or for any virus that may be contained in it. -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] random forest in RDKit - ctd.
Dear Paul, On Mon, May 9, 2011 at 4:28 PM, paul.czodrow...@merck.de wrote: the Wiki is a great place to start right from scratch with the RDKit ML capabilities! Glad to hear it. However, I wonder how to build a 3-class model: for i,m in enumerate(ms): if m.GetProp('ACTIVITY_CLASS')=='active': act=1 else: act=0 pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act]) Naively, I just tried act 0,1 or 2 - but this did not work. You do need to add 0,1,2, but you also need to change the last value in nPossible (the number of values each descriptor + the activity can take) to 3. Something like this (not tested): nPossible = [0]+[2]*ndescrs+[3] Then it should work. Note: in my experience building reliable models with three or more classes really requires a lot of data. -greg -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB processing
Hi Paul On Mon, May 9, 2011 at 1:13 PM, paul.czodrow...@merck.de wrote: Dear folks, is there a way to add occupancy B-factors (e.g. 1.00 50.0) to a PDB file? At the moment the RDKit has no way of processing PDB files. -greg -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB processing
On May 9, 2011, at 11:29 AM, Greg Landrum wrote: Hi Paul On Mon, May 9, 2011 at 1:13 PM, paul.czodrow...@merck.de wrote: Dear folks, is there a way to add occupancy B-factors (e.g. 1.00 50.0) to a PDB file? At the moment the RDKit has no way of processing PDB files. -greg If you need to work with PDB files, you might look at biopython, which has support for that format available. I don't know specifically about occupancy and B-factors, but it would be someplace to start. Andy -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] PDB processing
On Mon, May 9, 2011 at 7:29 PM, Andrew Fant f...@moleculargeek.com wrote: If you need to work with PDB files, you might look at biopython, which has support for that format available. I don't know specifically about occupancy and B-factors, but it would be someplace to start. Along those lines, another one worth looking at is Open Structure (http://www.openstructure.org/). Marco Biasini did a presentation on this impressive-looking system at the MIOSS meeting last week. -greg -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] More helpful error messages...
On Mon, May 9, 2011 at 10:27 AM, JP jeanpaul.ebe...@inhibox.com wrote: Using RDKit 2010.10 Some error messages need to be more helpful. For e.g. in a 10,000 molecule smiles file: [23:07:01] Can't kekulize mol Traceback (most recent call last): File ./x.py, line 314, in module main() File ./x.py, line 303, in main mols = doSomething(...) File ./x.py, line 185, in doSomething mol_noH = Chem.RemoveHs(mol_h) ValueError: Sanitization error: Can't kekulize mol Or a Cannot parse smiles error give an indication of what is going on -- but I need to know which mol they are failing at... Something like [23:07:01] Can't kekulize mol (some id and/or textual smiles) This is quite difficult to do since the kekulization function, called by RemoveHs, just sees the molecule. This is one you could do yourself though by catching that ValueError and then displaying the input value. If you are using either a SDMolSupplier or a SmilesMolSupplier, you may find its GetItemText() method quite useful here. If you aren't using one of those, you can try doing Chem.MolToSmiles on the bad molecule with the canonical argument set to False: In [15]: m = Chem.MolFromSmiles('[H]c1c([H])c([H])c([H])c1[H]',sanitize=False) In [16]: mh = Chem.RemoveHs(m) [06:12:37] Can't kekulize mol --- ValueErrorTraceback (most recent call last) /home/glandrum/RDKit_trunk/build/ipython console in module() ValueError: Sanitization error: Can't kekulize mol In [17]: Chem.MolToSmiles(m,canonical=False) Out[17]: '[H]c1c([H])c([H])c([H])c1[H]' The canonical argument was added to MolToSmiles() in the 2010.12 release, so you'll need to be using something at least as up-to-date as that. [23:07:01] Cannot parse smiles string (Ccc1XXXc) Would be more helpful... This much is certainly no problem to do so that one gets output like this: In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC') [06:06:25] syntax error while parsing: Ccc1XXXcCCC [06:06:25] SMILES Parse Error In [3]: Chem.MolFromSmiles('C1C') [06:06:28] Smiles parser error: unclosed ring for input C1C If this looks useful to people I can go ahead and make the change for the next release. Best, -greg -- Achieve unprecedented app performance and reliability What every C/C++ and Fortran developer should know. Learn how Intel has extended the reach of its next-generation tools to help boost performance applications - inlcuding clusters. http://p.sf.net/sfu/intel-dev2devmay ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss