On 11/23/2011 03:08 PM, Olivier Grisel wrote:
> 2011/11/23 Andreas Müller<[email protected]>:
>> Hi everybody.
>> Me again. I was getting some unexpected behaviour from the error metrics.
>> Consider the following:
>>
>> import numpy as np
>> from sklearn.datasets import load_digits
>> from sklearn.metrics import zero_one_score
>>
>> zero_one_score(digits.target, np.vstack(digits.target))
>>
>>   >>>  0.10
>>
>> The shape of digits.target is (1797,), the shape
>> of the stacked version is (1797, 1).
>> That seems to cause broadcasting in "==".
> Good catch.
>
>> I thought utils.check_arrays was meant to
>> avoid such problems, but it does not change the shape
>> of these two arrays.
>>
>> What did I do wrong or what did I misunderstand here?
>>
>> Obviously I could reshape either array so that no broadcasting
>> happens. I feel the problem is somewhat subtle, though,
>> and it took me 3 hours to find.
>>
>> If you feel that is a problem, should it be addressed in "check_arrays"?
> IMHO, we should have a specific check for 1D, integer arrays used for
> targets in classification tasks and another specific check for
> regression tasks with explicit docstring telling what we check and
> explicit ValueError message explicating what we where expecting and
> what we got instead.
>
That might be a good idea. Should the check for classifications tasks
then be performed for each call to "fit" and each classification metric?
I am not sure if you imply that want to check the dtype whether it is int.
Or would you rather check that the array contains integers?
Are there other requirements? I am not familiar enough with the
implementation of the classification algorithms to say what kind
of assumptions they make.
Do labels have to be 0..n or [-1, 1] ?

Cheers,
Andy

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to