Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

Nirmal Fernando Thu, 13 Aug 2015 21:23:23 -0700

This was mainly due to the detection of a numerical feature as a
categorical one.


We suggested increasing the categorical threshold as a work-around.
@thushan did it work?

On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara <[email protected]>
wrote:

> This issue occurs, if I turn the response variable to a categorical
> variable. If I get the variable as a numerical variable, the values are
> read correctly.
>
> So I presume there is a fault in categorical conversion of the variable.
>
> On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara <[email protected]>
> wrote:
>
>> I still get the same result
>>
>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>> 1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0    12.0
>> 12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0    13.0    13.0
>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0    14.0
>> 14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0    15.0
>> 15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0    16.0    16.0
>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0    17.0
>> 17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0    18.0
>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0    19.0    19.0
>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>> 19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0     2.0
>> 2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0     4.0
>> 4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0     5.0     5.0
>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>> 6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>> 6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0     7.0     7.0
>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>> 3.0     3.0     3.0     3.0
>>
>> On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando <[email protected]> wrote:
>>
>>> Can you use following code and try;
>>>
>>> List<LabeledPoint> points = labeledPoints.collect();
>>> for(int i=0;i<points.size();i++){
>>>              System.out.print(points.get(i).label() + "\t");
>>>             }
>>>
>>> On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara <[email protected]>
>>> wrote:
>>>
>>>> I used the following snippet
>>>>
>>>> for(int i=0;i<labeledPoints.collect().size();i++){
>>>>             System.out.print(labeledPoints.collect().get(i).label() +
>>>> "\t");
>>>>             }
>>>>
>>>> in the public MLModel build() throws MLModelBuilderException in
>>>> DeeplearningModelBuilder.java
>>>>
>>>>
>>>> On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi thushan,
>>>>>
>>>>> We need more info. What did you exactly print and where?
>>>>>
>>>>> On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara <[email protected]
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I found the potential cause of the poor accuracy for the leaf
>>>>>> dataset. It seems the data read into ML is wrong.
>>>>>>
>>>>>> I have attached the data file as a CSV (classes are in the last
>>>>>> column)
>>>>>>
>>>>>> However, when I print out the labels of the read data (classes), it
>>>>>> looks something like below. Clearly there aren't this many "3.0" classes
>>>>>> and there should be classes up to 36.0.
>>>>>>
>>>>>> Is this caused by a bug?
>>>>>>
>>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>>> 13.0    13.0
>>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>>> 16.0    16.0
>>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>>> 19.0    19.0
>>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>>> 5.0     5.0
>>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>>> 7.0     7.0
>>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>> 3.0     3.0
>>>>>> 3.0     3.0     3.0     3.0
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Thushan Ganegedara
>>>>>> School of IT
>>>>>> University of Sydney, Australia
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Thushan Ganegedara
>>>> School of IT
>>>> University of Sydney, Australia
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Thushan Ganegedara
>> School of IT
>> University of Sydney, Australia
>>
>
>
>
> --
> Regards,
>
> Thushan Ganegedara
> School of IT
> University of Sydney, Australia
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/

_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

Reply via email to