Thushan, please send your suggestions to the other thread :)

On Fri, Aug 14, 2015 at 10:22 AM, Thushan Ganegedara <thu...@gmail.com>
wrote:

> Moreover, I think a hybrid approach as follows might work well.
>
> 1. Select a sample
>
> 2. Filter columns by the data type and find potential categorical
> variables (integer / string)
>
> 3. Filter further by checking if same values are repeated multiple times
> in the dataset.
>
> On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara <thu...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Yes, no mater which approach used, there's always going to be outliers
>> which does not fit the defined rules. But for these corner cases, user
>> always have to opportunity to change the variable to numerical.
>>
>> One more approach is to introduce a measure of replication of values in a
>> column. If the column shows a repetition of same values many times, imo, it
>> is a good indicator for detecting categorical variable.
>>
>> On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando <nir...@wso2.com> wrote:
>>
>>>
>>>
>>> On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara <thu...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> This was mainly due to the detection of a numerical feature as a
>>>> categorical one.
>>>> Oh, it makes sense now. Why don't we try taking a sample of data and if
>>>> the sample contains only integers (or doubles without any decimals) or
>>>> strings, consider it as a categorical variable.
>>>>
>>>
>>> I tried that approach too, but there're some datasets like automobile
>>> dataset normalized-losses feature, which has integer values (0-164) but
>>> which is probably not categorical.
>>>
>>>>
>>>> We suggested increasing the categorical threshold as a work-around.
>>>> @thushan did it work?
>>>> Yes, it worked. After increasing the threshold to 40.
>>>>
>>>> On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando <nir...@wso2.com>
>>>> wrote:
>>>>
>>>>> This was mainly due to the detection of a numerical feature as a
>>>>> categorical one.
>>>>>
>>>>> We suggested increasing the categorical threshold as a work-around.
>>>>> @thushan did it work?
>>>>>
>>>>> On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara <thu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> This issue occurs, if I turn the response variable to a categorical
>>>>>> variable. If I get the variable as a numerical variable, the values are
>>>>>> read correctly.
>>>>>>
>>>>>> So I presume there is a fault in categorical conversion of the
>>>>>> variable.
>>>>>>
>>>>>> On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara <thu...@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> I still get the same result
>>>>>>>
>>>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>>>> 13.0    13.0
>>>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>>>> 16.0    16.0
>>>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>>>> 19.0    19.0
>>>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>>>> 5.0     5.0
>>>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>>>> 7.0     7.0
>>>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>> 3.0     3.0
>>>>>>> 3.0     3.0     3.0     3.0
>>>>>>>
>>>>>>> On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando <nir...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Can you use following code and try;
>>>>>>>>
>>>>>>>> List<LabeledPoint> points = labeledPoints.collect();
>>>>>>>> for(int i=0;i<points.size();i++){
>>>>>>>>              System.out.print(points.get(i).label() + "\t");
>>>>>>>>             }
>>>>>>>>
>>>>>>>> On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara <
>>>>>>>> thu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I used the following snippet
>>>>>>>>>
>>>>>>>>> for(int i=0;i<labeledPoints.collect().size();i++){
>>>>>>>>>             System.out.print(labeledPoints.collect().get(i).label()
>>>>>>>>> + "\t");
>>>>>>>>>             }
>>>>>>>>>
>>>>>>>>> in the public MLModel build() throws MLModelBuilderException in
>>>>>>>>> DeeplearningModelBuilder.java
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando <nir...@wso2.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi thushan,
>>>>>>>>>>
>>>>>>>>>> We need more info. What did you exactly print and where?
>>>>>>>>>>
>>>>>>>>>> On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara <
>>>>>>>>>> thu...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I found the potential cause of the poor accuracy for the leaf
>>>>>>>>>>> dataset. It seems the data read into ML is wrong.
>>>>>>>>>>>
>>>>>>>>>>> I have attached the data file as a CSV (classes are in the last
>>>>>>>>>>> column)
>>>>>>>>>>>
>>>>>>>>>>> However, when I print out the labels of the read data (classes),
>>>>>>>>>>> it looks something like below. Clearly there aren't this many "3.0" 
>>>>>>>>>>> classes
>>>>>>>>>>> and there should be classes up to 36.0.
>>>>>>>>>>>
>>>>>>>>>>> Is this caused by a bug?
>>>>>>>>>>>
>>>>>>>>>>> 1.0     1.0     1.0     1.0     1.0     1.0     1.0     1.0
>>>>>>>>>>> 1.0     1.0     1.0     1.0     12.0    12.0    12.0    12.0    12.0
>>>>>>>>>>> 12.0    12.0    12.0    12.0    12.0    13.0    13.0    13.0    13.0
>>>>>>>>>>> 13.0    13.0
>>>>>>>>>>> 13.0    13.0    13.0    13.0    14.0    14.0    14.0    14.0
>>>>>>>>>>> 14.0    14.0    14.0    14.0    15.0    15.0    15.0    15.0    15.0
>>>>>>>>>>> 15.0    15.0    15.0    15.0    15.0    15.0    15.0    16.0    16.0
>>>>>>>>>>> 16.0    16.0
>>>>>>>>>>> 16.0    16.0    16.0    16.0    17.0    17.0    17.0    17.0
>>>>>>>>>>> 17.0    17.0    17.0    17.0    17.0    17.0    18.0    18.0    18.0
>>>>>>>>>>> 18.0    18.0    18.0    18.0    18.0    18.0    18.0    18.0    19.0
>>>>>>>>>>> 19.0    19.0
>>>>>>>>>>> 19.0    19.0    19.0    19.0    19.0    19.0    19.0    19.0
>>>>>>>>>>> 19.0    19.0    19.0    2.0     2.0     2.0     2.0     2.0     2.0
>>>>>>>>>>> 2.0     2.0     2.0     2.0     2.0     2.0     2.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     4.0     4.0     4.0     4.0     4.0
>>>>>>>>>>> 4.0     4.0     4.0     4.0     4.0     4.0     4.0     5.0     5.0
>>>>>>>>>>> 5.0     5.0
>>>>>>>>>>> 5.0     5.0     5.0     5.0     5.0     5.0     5.0     5.0
>>>>>>>>>>> 5.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0     6.0
>>>>>>>>>>> 6.0     6.0     6.0     6.0     7.0     7.0     7.0     7.0     7.0
>>>>>>>>>>> 7.0     7.0
>>>>>>>>>>> 7.0     7.0     7.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0     3.0
>>>>>>>>>>> 3.0     3.0
>>>>>>>>>>> 3.0     3.0     3.0     3.0
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Regards,
>>>>>>>>>>>
>>>>>>>>>>> Thushan Ganegedara
>>>>>>>>>>> School of IT
>>>>>>>>>>> University of Sydney, Australia
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Thanks & regards,
>>>>>>>>>> Nirmal
>>>>>>>>>>
>>>>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Thushan Ganegedara
>>>>>>>>> School of IT
>>>>>>>>> University of Sydney, Australia
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks & regards,
>>>>>>>> Nirmal
>>>>>>>>
>>>>>>>> Team Lead - WSO2 Machine Learner
>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>> Mobile: +94715779733
>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>>
>>>>>>> Thushan Ganegedara
>>>>>>> School of IT
>>>>>>> University of Sydney, Australia
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>>
>>>>>> Thushan Ganegedara
>>>>>> School of IT
>>>>>> University of Sydney, Australia
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Thanks & regards,
>>>>> Nirmal
>>>>>
>>>>> Team Lead - WSO2 Machine Learner
>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>> Mobile: +94715779733
>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>>
>>>> Thushan Ganegedara
>>>> School of IT
>>>> University of Sydney, Australia
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Team Lead - WSO2 Machine Learner
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>>
>> Thushan Ganegedara
>> School of IT
>> University of Sydney, Australia
>>
>
>
>
> --
> Regards,
>
> Thushan Ganegedara
> School of IT
> University of Sydney, Australia
>



-- 

Thanks & regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to