[Rdkit-discuss] More helpful error messages...

2011-05-09 Thread JP
Using RDKit 2010.10

Some error messages need to be more helpful.  For e.g. in a 10,000 molecule
smiles file:

[23:07:01] Can't kekulize mol

Traceback (most recent call last):
  File ./x.py, line 314, in module
main()
  File ./x.py, line 303, in main
mols = doSomething(...)
  File ./x.py, line 185, in doSomething
mol_noH = Chem.RemoveHs(mol_h)
ValueError: Sanitization error: Can't kekulize mol

Or a Cannot parse smiles error give an indication of what is going on --
but I need to know which mol they are failing at...

Something like

[23:07:01] Can't kekulize mol (some id and/or textual smiles)
[23:07:01] Cannot parse smiles string (Ccc1XXXc)

Would be more helpful...









Jean-Paul Ebejer
Early Stage Researcher

InhibOx Ltd
Pembroke House
36-37 Pembroke Street
Oxford
OX1 1BP
UK

(+44 / 0) 1865 262 034



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
Any unauthorised dissemination or copying of this email or its attachments,
and any use or disclosure of any information contained in them, is strictly
prohibited and may be illegal.  If you have received this email in error
please notify the sender and delete all copies from your system.

We and our group companies accept no liability or responsibility for
personal emails or emails unconnected with our business.

Internet communications including emails and access and use of web sites
cannot be guaranteed to be secure or error free as information can be
intercepted, corrupted, lost or arrive late. Furthermore, while we have
taken steps to control the spread of viruses on our systems, we cannot
guarantee that this email and any files transmitted with it are virus free.
No liability is accepted for any errors, omissions, interceptions, corrupted
mail, lost communications or late delivery arising as a result of receiving
this message via the Internet or for any virus that may be contained in it.
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Is it possible?

2011-05-09 Thread JP
Dearest Greg,

Are your slides about the RDkit DB cartridge talk in Cambridge publicly
available?

Cheers

Jean-Paul Ebejer
Early Stage Researcher

InhibOx Ltd
Pembroke House
36-37 Pembroke Street
Oxford
OX1 1BP
UK

(+44 / 0) 1865 262 034



This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
Any unauthorised dissemination or copying of this email or its attachments,
and any use or disclosure of any information contained in them, is strictly
prohibited and may be illegal.  If you have received this email in error
please notify the sender and delete all copies from your system.

We and our group companies accept no liability or responsibility for
personal emails or emails unconnected with our business.

Internet communications including emails and access and use of web sites
cannot be guaranteed to be secure or error free as information can be
intercepted, corrupted, lost or arrive late. Furthermore, while we have
taken steps to control the spread of viruses on our systems, we cannot
guarantee that this email and any files transmitted with it are virus free.
No liability is accepted for any errors, omissions, interceptions, corrupted
mail, lost communications or late delivery arising as a result of receiving
this message via the Internet or for any virus that may be contained in it.
--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] PDB processing

2011-05-09 Thread Paul . Czodrowski

Dear folks,

is there a way to add occupancy  B-factors (e.g. 1.00 50.0) to a PDB file?


Thanks  Cheers,
Paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Typo in wiki [DatabaseCreation2]

2011-05-09 Thread Adrian Schreyer
Hi Greg,

there is a typo on the page describing the creation of a database
using eMolecules:

sed in the form 's/\\//' to escape backslashes will only replace
the first occurrence of the match, the 'g' at the end is necessary to
replace all occurrences on a line.

grep '6172136'  eMolecules-2011-01-02.smi | sed 's/\\//'
C/C(=C\\c1c1)/C=C\1/N=C(OC1=O)c1ccc(cc1)[N+](=O)[O-] 6172136

grep '6172136'  eMolecules-2011-01-02.smi | sed 's/\\//g'
C/C(=C\\c1c1)/C=C\\1/N=C(OC1=O)c1ccc(cc1)[N+](=O)[O-] 6172136

Cheers,

Adrian

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] random forest in RDKit - ctd.

2011-05-09 Thread Paul . Czodrowski
Dear Greg,

the Wiki is a great place to start right from scratch with the RDKit ML
capabilities!

However, I wonder how to build a 3-class model:


for i,m in enumerate(ms):
 if m.GetProp('ACTIVITY_CLASS')=='active':
  act=1
 else:
  act=0
 pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act])


Naively, I just tried act 0,1 or 2 - but this did not work.


Cheers  Thanks,
Paul


P.S.: Although I'm rather to RDKit, I'm willing to continue the Wiki-Entry
regarding the ML capabilities. But someone should cross-check, if I don't
add any errors..

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://disclaimer.merck.de to access the German, French, Spanish and
Portuguese versions of this disclaimer.


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Is it possible?

2011-05-09 Thread Greg Landrum
Hi JP,

They will be publicly available. I'll send a link when they are.

-greg


On Mon, May 9, 2011 at 12:20 PM, JP jeanpaul.ebe...@inhibox.com wrote:

 Dearest Greg,
 Are your slides about the RDkit DB cartridge talk in Cambridge publicly
 available?
 Cheers

 Jean-Paul Ebejer
 Early Stage Researcher
 InhibOx Ltd
 Pembroke House
 36-37 Pembroke Street
 Oxford
 OX1 1BP
 UK
 (+44 / 0) 1865 262 034


 This email and any files transmitted with it are confidential and intended
 solely for the use of the individual or entity to whom they are addressed.
 Any unauthorised dissemination or copying of this email or its attachments,
 and any use or disclosure of any information contained in them, is strictly
 prohibited and may be illegal.  If you have received this email in error
 please notify the sender and delete all copies from your system.

 We and our group companies accept no liability or responsibility for
 personal emails or emails unconnected with our business.

 Internet communications including emails and access and use of web sites
 cannot be guaranteed to be secure or error free as information can be
 intercepted, corrupted, lost or arrive late. Furthermore, while we have
 taken steps to control the spread of viruses on our systems, we cannot
 guarantee that this email and any files transmitted with it are virus free.
 No liability is accepted for any errors, omissions, interceptions, corrupted
 mail, lost communications or late delivery arising as a result of receiving
 this message via the Internet or for any virus that may be contained in it.


 --
 WhatsUp Gold - Download Free Network Management Software
 The most intuitive, comprehensive, and cost-effective network
 management toolset available today.  Delivers lowest initial
 acquisition cost and overall TCO of any competing solution.
 http://p.sf.net/sfu/whatsupgold-sd
 ___
 Rdkit-discuss mailing list
 Rdkit-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/rdkit-discuss



--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] random forest in RDKit - ctd.

2011-05-09 Thread Greg Landrum
Dear Paul,

On Mon, May 9, 2011 at 4:28 PM,  paul.czodrow...@merck.de wrote:

 the Wiki is a great place to start right from scratch with the RDKit ML
 capabilities!

Glad to hear it.


 However, I wonder how to build a 3-class model:

 
 for i,m in enumerate(ms):
  if m.GetProp('ACTIVITY_CLASS')=='active':
  act=1
  else:
  act=0
  pts.append([m.GetProp('CompoundName')]+list(descrs[i])+[act])
 

 Naively, I just tried act 0,1 or 2 - but this did not work.

You do need to add 0,1,2, but you also need to change the last value
in nPossible (the number of values each descriptor + the activity can
take) to 3. Something like this (not tested):
nPossible = [0]+[2]*ndescrs+[3]

Then it should work.

Note: in my experience building reliable models with three or more
classes really requires a lot of data.

-greg

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB processing

2011-05-09 Thread Greg Landrum
Hi Paul

On Mon, May 9, 2011 at 1:13 PM,  paul.czodrow...@merck.de wrote:

 Dear folks,

 is there a way to add occupancy  B-factors (e.g. 1.00 50.0) to a PDB file?


At the moment the RDKit has no way of processing PDB files.

-greg

--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB processing

2011-05-09 Thread Andrew Fant
On May 9, 2011, at 11:29 AM, Greg Landrum wrote:

 Hi Paul
 
 On Mon, May 9, 2011 at 1:13 PM,  paul.czodrow...@merck.de wrote:
 
 Dear folks,
 
 is there a way to add occupancy  B-factors (e.g. 1.00 50.0) to a PDB file?
 
 
 At the moment the RDKit has no way of processing PDB files.
 
 -greg

If you need to work with PDB files, you might look at biopython, which has 
support for that format available.  I don't know specifically about occupancy 
and B-factors, but it would be someplace to start.

Andy


--
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] PDB processing

2011-05-09 Thread Greg Landrum
On Mon, May 9, 2011 at 7:29 PM, Andrew Fant f...@moleculargeek.com wrote:

 If you need to work with PDB files, you might look at biopython, which has 
 support for that format available.  I don't know specifically about occupancy 
 and B-factors, but it would be someplace to start.


Along those lines, another one worth looking at is Open Structure
(http://www.openstructure.org/). Marco Biasini did a presentation on
this impressive-looking system at the MIOSS meeting last week.

-greg

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] More helpful error messages...

2011-05-09 Thread Greg Landrum
On Mon, May 9, 2011 at 10:27 AM, JP jeanpaul.ebe...@inhibox.com wrote:
 Using RDKit 2010.10
 Some error messages need to be more helpful.  For e.g. in a 10,000 molecule
 smiles file:
 [23:07:01] Can't kekulize mol
 Traceback (most recent call last):
   File ./x.py, line 314, in module
     main()
   File ./x.py, line 303, in main
     mols = doSomething(...)
   File ./x.py, line 185, in doSomething
     mol_noH = Chem.RemoveHs(mol_h)
 ValueError: Sanitization error: Can't kekulize mol

 Or a Cannot parse smiles error give an indication of what is going on --
 but I need to know which mol they are failing at...
 Something like
 [23:07:01] Can't kekulize mol (some id and/or textual smiles)

This is quite difficult to do since the kekulization function, called
by RemoveHs, just sees the molecule. This is one you could do yourself
though by catching that ValueError and then displaying the input
value. If you are using either a SDMolSupplier or a SmilesMolSupplier,
you may find its GetItemText() method quite useful here. If you aren't
using one of those, you can try doing Chem.MolToSmiles on the bad
molecule with the canonical argument set to False:

In [15]: m = Chem.MolFromSmiles('[H]c1c([H])c([H])c([H])c1[H]',sanitize=False)

In [16]: mh = Chem.RemoveHs(m)
[06:12:37] Can't kekulize mol

---
ValueErrorTraceback (most recent call last)

/home/glandrum/RDKit_trunk/build/ipython console in module()

ValueError: Sanitization error: Can't kekulize mol


In [17]: Chem.MolToSmiles(m,canonical=False)
Out[17]: '[H]c1c([H])c([H])c([H])c1[H]'

The canonical argument was added to MolToSmiles() in the 2010.12
release, so you'll need to be using something at least as up-to-date
as that.

 [23:07:01] Cannot parse smiles string (Ccc1XXXc)
 Would be more helpful...

This much is certainly no problem to do so that one gets output like this:

In [2]: Chem.MolFromSmiles('Ccc1XXXcCCC')
[06:06:25] syntax error while parsing: Ccc1XXXcCCC
[06:06:25] SMILES Parse Error

In [3]: Chem.MolFromSmiles('C1C')
[06:06:28] Smiles parser error: unclosed ring for input C1C

If this looks useful to people I can go ahead and make the change for
the next release.

Best,
-greg

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss