Hi,

Yes, no mater which approach used, there's always going to be outliers
which does not fit the defined rules. But for these corner cases, user
always have to opportunity to change the variable to numerical.

One more approach is to introduce a measure of replication of values in a
column. If the column shows a repetition of same values many times, imo, it
is a good indicator for detecting categorical variable.

On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando <nir...@wso2.com> wrote:

>
>
> On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara <thu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> This was mainly due to the detection of a numerical feature as a
>> categorical one.
>> Oh, it makes sense now. Why don't we try taking a sample of data and if
>> the sample contains only integers (or doubles without any decimals) or
>> strings, consider it as a categorical variable.
>>
>
> I tried that approach too, but there're some datasets like automobile
> dataset normalized-losses feature, which has integer values (0-164) but
> which is probably not categorical.
>
>>
>> We suggested increasing the categorical threshold as a work-around.
>> @thushan did it work?
>> Yes, it worked. After increasing the threshold to 40.
>>
>> On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando <nir...@wso2.com> wrote:
>>
>>> This was mainly due to the detection of a numerical feature as a
>>> categorical one.
>>>
>>> We suggested increasing the categorical threshold as a work-around.
>>> @thushan did it work?
>>>
>>> On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara <thu...@gmail.com>
>>> wrote:
>>>
>>>> This issue occurs, if I turn the response variable to a categorical
>>>> variable. If I get the variable as a numerical variable, the values are
>>>> read correctly.
>>>>
>>>> So I presume there is a fault in categorical conversion of the variable.
>>>>
>>>> On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara <thu...@gmail.com>
>>>> wrote:
>>>>
>>>>> I still get the same result
>>>>>
>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>> 13.0    13.0
>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>> 16.0    16.0
>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>> 19.0    19.0
>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>> 5.0     5.0
>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>> 7.0     7.0
>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>> 3.0     3.0
>>>>> 3.0     3.0     3.0     3.0
>>>>>
>>>>> On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando <nir...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Can you use following code and try;
>>>>>>
>>>>>> List<LabeledPoint> points = labeledPoints.collect();
>>>>>> for(int i=0;i<points.size();i++){
>>>>>>              System.out.print(points.get(i).label() + "\t");
>>>>>>             }
>>>>>>
>>>>>> On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara <thu...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> I used the following snippet
>>>>>>>
>>>>>>> for(int i=0;i<labeledPoints.collect().size();i++){
>>>>>>>             System.out.print(labeledPoints.collect().get(i).label()
>>>>>>> + "\t");
>>>>>>>             }
>>>>>>>
>>>>>>> in the public MLModel build() throws MLModelBuilderException in
>>>>>>> DeeplearningModelBuilder.java
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando <nir...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi thushan,
>>>>>>>>
>>>>>>>> We need more info. What did you exactly print and where?
>>>>>>>>
>>>>>>>> On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara <
>>>>>>>> thu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I found the potential cause of the poor accuracy for the leaf
>>>>>>>>> dataset. It seems the data read into ML is wrong.
>>>>>>>>>
>>>>>>>>> I have attached the data file as a CSV (classes are in the last
>>>>>>>>> column)
>>>>>>>>>
>>>>>>>>> However, when I print out the labels of the read data (classes),
>>>>>>>>> it looks something like below. Clearly there aren't this many "3.0" 
>>>>>>>>> classes
>>>>>>>>> and there should be classes up to 36.0.
>>>>>>>>>
>>>>>>>>> Is this caused by a bug?
>>>>>>>>>
>>>>>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>>>>>> 13.0    13.0
>>>>>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>>>>>> 16.0    16.0
>>>>>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>>>>>> 19.0    19.0
>>>>>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>>>>>> 5.0     5.0
>>>>>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>>>>>> 7.0     7.0
>>>>>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>> 3.0     3.0
>>>>>>>>> 3.0     3.0     3.0     3.0
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Thushan Ganegedara
>>>>>>>>> School of IT
>>>>>>>>> University of Sydney, Australia
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks & regards,
>>>>>>>> Nirmal
>>>>>>>>
>>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>> Mobile: +94715779733
>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Thushan Ganegedara
>>>>>>> School of IT
>>>>>>> University of Sydney, Australia
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Team Lead - WSO2 Machine Learner
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>>
>>>>> Thushan Ganegedara
>>>>> School of IT
>>>>> University of Sydney, Australia
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Thushan Ganegedara
>>>> School of IT
>>>> University of Sydney, Australia
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Thushan Ganegedara
>> School of IT
>> University of Sydney, Australia
>>
>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Team Lead - WSO2 Machine Learner
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>


-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to