Re: [Rdkit-discuss] ML question

2012-08-26 Thread Paul . Czodrowski
Dear Greg,


> > # actual predicion
> >
> > prediction_dictionary = {}
> > for x in cpds_w_descr:
> > pred,conf=cmp.ClassifyExample(x[1:])
> > NAME=x[0]
> > prediction_dictionary[NAME]=pred,conf
> > i+=1
> > for mol in cpds:
> > mol_name = mol.GetProp('_Name')
> > mol.SetProp('prediction',str(prediction_dictionary[mol_name][0]))
> > mol.SetProp('prediction_confidence',str(prediction_dictionary
> > [mol_name][1]))
> > testset_pred.write(mol)
>
> This is just a guess, but it looks like you're passing ClassifyExample
> a shorter vector for each point than what you passed to Grow.
> Does it work if you do: pred,conf=cmp.ClassifyExample(x)?
>
> -greg


wonderful, thanks for this hint!
i was too much stuck in the code...


cheers,
paul

This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.merckgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] ML question

2012-08-26 Thread Greg Landrum
Hi Paul,

On Fri, Aug 24, 2012 at 1:57 PM,   wrote:
>
>
> please find below a code snippet for a 2class model.
> The confusion matrix looks fine.
>
> But when re-applying the model (for test purposes), I end up with
> predictions that ony give consistently ONE class.
>
>
>
> # descriptor calculation etc...
> cmp.Grow(cpds_w_descr,attrs=attrs,nPossibleVals=nPossible,nTries=10,\
>  buildDriver=CrossValidate.CrossValidationDriver,\
>  treeBuilder=QuantTreeBoot,
> needsQuantization=False,nQuantBounds=boundsPerVar, maxDepth=3)
>



>
> # actual predicion
>
> prediction_dictionary = {}
> for x in cpds_w_descr:
> pred,conf=cmp.ClassifyExample(x[1:])
> NAME=x[0]
> prediction_dictionary[NAME]=pred,conf
> i+=1
> for mol in cpds:
> mol_name = mol.GetProp('_Name')
> mol.SetProp('prediction',str(prediction_dictionary[mol_name][0]))
> mol.SetProp('prediction_confidence',str(prediction_dictionary
> [mol_name][1]))
> testset_pred.write(mol)

This is just a guess, but it looks like you're passing ClassifyExample
a shorter vector for each point than what you passed to Grow.
Does it work if you do: pred,conf=cmp.ClassifyExample(x)?

-greg

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss