By removing label from the training set, and then rerun the process (fit, 
predict, etc.). The result looks reasonable. 

Thank you very much. 


----- Original Message -----
From: Andreas Mueller <[email protected]>
To: Jason Williams <[email protected]>; 
[email protected]
Cc: 
Sent: Thursday, 15 August 2013, 8:58
Subject: Re: [Scikit-learn-general] Classification accuracy too high

On 08/15/2013 01:08 PM, Jason Williams wrote:
> I follow the sample at 
> http://blog.yhathq.com/posts/random-forests-in-python.html where it randomly 
> assigns true, false to the dataset
> 
>      np.random.uniform(0, 1, len(df)) <= .75
> 
> then partition dataset into train set and test set. I use the same way for 
> creating model
> 
In principle that sounds good.
For cross-validation you could also just use the cross_val_score function from 
the cross-validation module.

There are tree interpretations that come to my mind:
1) You dataset is just very easy, and the classifier learns it perfectly.
2) There is some trivial unwanted solution, for example you included the output 
label as a feature in the inputs.
3) There are strong correlations between the data points. If your dataset 
contains many near-duplicates,
then for each sample in the test-set there is a very similar sample in the 
training set and it got leaned by heart.
In this case you shouldn't assign to training and test-set randomly but respect 
the correlation structure of the data.
(One example would be having video frames from several sequences. Then you 
shouldn't assign the frames
randomly to training and test, but the whole sequence).

Hth,
Andy


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead. 
Download for free and get started troubleshooting in minutes. 
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to