Re: [Rdkit-discuss] apply model - use Composite

2011-06-09 Thread Greg Landrum
HI Paul,

On Thu, Jun 9, 2011 at 12:56 PM,   wrote:
>
>
> I'm trying to apply a Composite model on a test set.
> However, no output is generated. At least, no error/warning, but I cannot
> jugde if the model gives any predictions.

indeed it does.

> Now I would like to use the model to do predictions for the test set:
>
>  cmp.ClassifyExample(pts_test1[0])

You've got it at this point. You just need to capture the result of that call:
>>> pred,conf=cmp.ClassifyExample(pts_test1[0])
>>> print pred,conf
--> print(pred,conf)
(2, 1.0)

And, since this is a validation data set and you have experimental
values, you can use ScreenComposite.ShowVoteResults just like you did
when training the model:
>>> res = ScreenComposite.ShowVoteResults(range(len(pts_test1)), pts_test1, 
>>> cmp, 3, 0,errorEstimate=False)

*** Vote Results ***
misclassified: 43/257 (%16.73)  43/257 (%16.73)

average correct confidence:0.9047
average incorrect confidence:  0.7605

Results Table:

  30   6   0  |  73.17
  10  97  15  |  83.62
   0  12  87  |  84.47
 --- --- ---
   75.00   84.35   85.29


Note that I used "errorEstimate=False" here since this is a test set
and I can use the full model instead of doing out-of-bag validation.

The performance of this model is already pretty good, but I can use
the information about confidence again to set a threshold on the
predictions and improve accuracy at the expense of rejecting some data
points:

>>> res = ScreenComposite.ShowVoteResults(range(len(pts_test1)), pts_test1, 
>>> cmp, 3, 0.7,errorEstimate=False)

*** Vote Results ***
misclassified: 22/257 (%8.56)   22/203 (%10.84)
skipped: 54/257 (% 21.01)

average correct confidence:0.9613
average incorrect confidence:  0.9182

Results Table:

  19   0   0  |  76.00
   5  83  10  |  91.21
   0   7  79  |  87.78
 --- --- ---
   79.17   92.22   88.76

Paul: one thing you should look into: if I copy the code from the wiki
and paste it into a python session, it doesn't work. There are some
imports missing.

-greg

--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] rdkit.Chem.Draw.spingCanvas.py (and py27 aggdraw / cairo help?)

2011-06-09 Thread James Davidson
Dear All,
 
I am in the process of upgrading to python 2.7 under Windows, and part
of this has included moving to the RDKit_2011_03_2 (py27) build.  I had
previously done most work with earlier versions of RDKit under python
2.6, but have found a problem with calling Draw.MolToImage() with the
latest RDKit binary for both py26 and py27:
 
Traceback (innermost last):
  File "C:\Python26\lib\site-packages\Pmw\Pmw_1_3\lib\PmwBase.py", line
1747, in __call__
return apply(self.func, args)
  File "C:\Python26\lib\site-packages\pmg_tk\startup\VerMOL.py", line
1188, in 
command=lambda
s=self:s.draw_ligand(self.modelling_chainlist.listbox, self.ligcanvas,
self.smiles, '3D',200,200, 'modelling_lig_image'))
  File "C:\Python26\lib\site-packages\pmg_tk\startup\VerMOL.py", line
2871, in draw_ligand
im = Draw.MolToImage(mol, size=(x,y))
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\__init__.py", line
71, in MolToImage
drawer.AddMol(mol,**kwargs)
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\MolDrawing.py", line
361, in AddMol
color=color,width=width,color2=color2)
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\MolDrawing.py", line
190, in _drawBond
dash=self.dash)
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\MolDrawing.py", line
169, in _drawWedgedBond
 
self.canvas.addCanvasDashedWedge(poly[0],poly[1],poly[2],color=color)
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\spingCanvas.py",
line 104, in addCanvasDashedWedge
pts1 = _getLinePoints(p1,p2,dash)
: global name '_getLinePoints' is not
defined
 
 
Not a big problem to sort - I think spingCanvas.addCanvasDashedWedge()
should read:
 
pts1 = self._getLinePoints(p1,p2,dash)
pts2 = self._getLinePoints(p1,p3,dash)
 
on lines 104, 105 instead of:
 
pts1 = _getLinePoints(p1,p2,dash)
pts2 = _getLinePoints(p1,p3,dash)
 
 
Anyway, this is only a problem if spingCanvas is being called - which I
think only happens as a last resort if aggdraw or cairo aren't found.
So on that note, the reason I was calling spingCanvas was that I don't
have a build of aggdraw for python 2.7, and I have found that when
cairo/pycairo are available to python 2.7 I get a pythonw.exe
Application Error at the point of calling Draw.MolToImage().  Under
python 2.6 I thought I would see what would happen if I removed aggdraw
to force cairo into play (different version of PIL, different version of
cairo - not an ideal comparison!):
 
File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\__init__.py", line 54,
in MolToImage
canvas = Canvas(img)
  File "C:\Python26\RDKit_2011_03_2\rdkit\Chem\Draw\cairoCanvas.py",
line 38, in __init__
imgd = image.tostring("raw","BGRA")
  File "C:\Python26\lib\site-packages\PIL\Image.py", line 516, in
tostring
e = _getencoder(self.mode, encoder_name, args)
  File "C:\Python26\lib\site-packages\PIL\Image.py", line 389, in
_getencoder
return apply(encoder, (mode,) + args + extra)
: unknown raw mode
 
 
If it helps, I can follow-up with more details on exact versions of
DLLs, etc; but for now wondered if:
 
(a) anybody had a version of aggdraw for windows, built with python 2.7?
(b) or any recommendations for reliable PIL / cairo / pycairo
combinations for python 2.7 / windows?
 
Kind regards
 
 
James

__
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
__--
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev___
Rdkit-disc

[Rdkit-discuss] apply model - use Composite

2011-06-09 Thread Paul . Czodrowski

Dear all,

I'm trying to apply a Composite model on a test set.
However, no output is generated. At least, no error/warning, but I cannot
jugde if the model gives any predictions.

This is the model
"

  
 nms=[x[0] for x in Descriptors._descList]  
  
 nms.remove('MolecularFormula') 
  
 calc = MoleculeDescriptors.MolecularDescriptorCalculator(nms)  
  
 descrs = [calc.CalcDescriptors(x) for x in ms] 
  
 ndescrs = len(calc.GetDescriptorNames())   
  

  
 pts=[] 
  

  
 for i,m in enumerate(ms):  
  
     if m.GetProp('SOL_classification')=='(A) low': 
  
         act=2  
  
     elif m.GetProp('SOL_classification')=='(B) medium':
  
         act=1  
  
     else:  
  
         act = 0
  
     pts.append([m.GetProp('NAME')]+list(descrs[i])+[act])  
  

  
 cPickle.dump(pts,file('descrs.pkl','wb+')) 
  

  
 from rdkit.ML import ScreenComposite   
  
 pts = cPickle.load(file('descrs.pkl','rb'))
  
 ndescrs = len(pts[0])-2
  
 boundsPerVar = [0]+[1]*ndescrs+[0] 
  
 nPossible = [0]+[2]*ndescrs+[3]
  
 attrs = range(1,ndescrs+1) 
  
 cmp = Composite()  
  
 
cmp.Grow(pts,attrs=attrs,nPossibleVals=nPossible,nTries=10,buildDriver=CrossValidate.CrossValidationDriver,
  
 treeBuilder=QuantTreeBoot, needsQuantization=False,nQuantBounds=boundsPerVar, 
maxDepth=3)
 res = ScreenComposite.ShowVoteResults(range(len(pts)), pts, cmp, 3, 
0,errorEstimate=True)

  


"

Now I would like to use the model to do predictions for the test set:
"

   
 test1 = [x for x in Chem.SDMolSupplier('test1.sdf') if x is not None]  
   
 nms_test1=[x[0] for x in Descriptors._descList]
   
 nms_test1.remove('MolecularFormula')   
   
 calc_test1 = MoleculeDescriptors.MolecularDescriptorCalculator(nms_test1)  
   
 descrs_test1 = [calc_test1.CalcDescriptors(x) for x in test1]  
   
 pts_test1 = [] 
   
 for i,m in enumerate(test1):   
   
     if m.GetProp('SOL_classification')=='(A) low': 
   
         act=2  
   
     elif m.GetProp('SOL_classification')=='(B) medium':
   
         act=1