Thushan, please send your suggestions to the other thread :) On Fri, Aug 14, 2015 at 10:22 AM, Thushan Ganegedara <thu...@gmail.com> wrote:
> Moreover, I think a hybrid approach as follows might work well. > > 1. Select a sample > > 2. Filter columns by the data type and find potential categorical > variables (integer / string) > > 3. Filter further by checking if same values are repeated multiple times > in the dataset. > > On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara <thu...@gmail.com> > wrote: > >> Hi, >> >> Yes, no mater which approach used, there's always going to be outliers >> which does not fit the defined rules. But for these corner cases, user >> always have to opportunity to change the variable to numerical. >> >> One more approach is to introduce a measure of replication of values in a >> column. If the column shows a repetition of same values many times, imo, it >> is a good indicator for detecting categorical variable. >> >> On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando <nir...@wso2.com> wrote: >> >>> >>> >>> On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara <thu...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> This was mainly due to the detection of a numerical feature as a >>>> categorical one. >>>> Oh, it makes sense now. Why don't we try taking a sample of data and if >>>> the sample contains only integers (or doubles without any decimals) or >>>> strings, consider it as a categorical variable. >>>> >>> >>> I tried that approach too, but there're some datasets like automobile >>> dataset normalized-losses feature, which has integer values (0-164) but >>> which is probably not categorical. >>> >>>> >>>> We suggested increasing the categorical threshold as a work-around. >>>> @thushan did it work? >>>> Yes, it worked. After increasing the threshold to 40. >>>> >>>> On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando <nir...@wso2.com> >>>> wrote: >>>> >>>>> This was mainly due to the detection of a numerical feature as a >>>>> categorical one. >>>>> >>>>> We suggested increasing the categorical threshold as a work-around. >>>>> @thushan did it work? >>>>> >>>>> On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara <thu...@gmail.com> >>>>> wrote: >>>>> >>>>>> This issue occurs, if I turn the response variable to a categorical >>>>>> variable. If I get the variable as a numerical variable, the values are >>>>>> read correctly. >>>>>> >>>>>> So I presume there is a fault in categorical conversion of the >>>>>> variable. >>>>>> >>>>>> On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara <thu...@gmail.com >>>>>> > wrote: >>>>>> >>>>>>> I still get the same result >>>>>>> >>>>>>> 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 >>>>>>> 1.0 1.0 1.0 1.0 12.0 12.0 12.0 12.0 12.0 >>>>>>> 12.0 12.0 12.0 12.0 12.0 13.0 13.0 13.0 13.0 >>>>>>> 13.0 13.0 >>>>>>> 13.0 13.0 13.0 13.0 14.0 14.0 14.0 14.0 >>>>>>> 14.0 14.0 14.0 14.0 15.0 15.0 15.0 15.0 15.0 >>>>>>> 15.0 15.0 15.0 15.0 15.0 15.0 15.0 16.0 16.0 >>>>>>> 16.0 16.0 >>>>>>> 16.0 16.0 16.0 16.0 17.0 17.0 17.0 17.0 >>>>>>> 17.0 17.0 17.0 17.0 17.0 17.0 18.0 18.0 18.0 >>>>>>> 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 19.0 >>>>>>> 19.0 19.0 >>>>>>> 19.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0 >>>>>>> 19.0 19.0 19.0 2.0 2.0 2.0 2.0 2.0 2.0 >>>>>>> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 >>>>>>> 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 >>>>>>> 5.0 5.0 >>>>>>> 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 >>>>>>> 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 >>>>>>> 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 >>>>>>> 7.0 7.0 >>>>>>> 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>> 3.0 3.0 >>>>>>> 3.0 3.0 3.0 3.0 >>>>>>> >>>>>>> On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando <nir...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Can you use following code and try; >>>>>>>> >>>>>>>> List<LabeledPoint> points = labeledPoints.collect(); >>>>>>>> for(int i=0;i<points.size();i++){ >>>>>>>> System.out.print(points.get(i).label() + "\t"); >>>>>>>> } >>>>>>>> >>>>>>>> On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara < >>>>>>>> thu...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I used the following snippet >>>>>>>>> >>>>>>>>> for(int i=0;i<labeledPoints.collect().size();i++){ >>>>>>>>> System.out.print(labeledPoints.collect().get(i).label() >>>>>>>>> + "\t"); >>>>>>>>> } >>>>>>>>> >>>>>>>>> in the public MLModel build() throws MLModelBuilderException in >>>>>>>>> DeeplearningModelBuilder.java >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando <nir...@wso2.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi thushan, >>>>>>>>>> >>>>>>>>>> We need more info. What did you exactly print and where? >>>>>>>>>> >>>>>>>>>> On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara < >>>>>>>>>> thu...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I found the potential cause of the poor accuracy for the leaf >>>>>>>>>>> dataset. It seems the data read into ML is wrong. >>>>>>>>>>> >>>>>>>>>>> I have attached the data file as a CSV (classes are in the last >>>>>>>>>>> column) >>>>>>>>>>> >>>>>>>>>>> However, when I print out the labels of the read data (classes), >>>>>>>>>>> it looks something like below. Clearly there aren't this many "3.0" >>>>>>>>>>> classes >>>>>>>>>>> and there should be classes up to 36.0. >>>>>>>>>>> >>>>>>>>>>> Is this caused by a bug? >>>>>>>>>>> >>>>>>>>>>> 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 >>>>>>>>>>> 1.0 1.0 1.0 1.0 12.0 12.0 12.0 12.0 12.0 >>>>>>>>>>> 12.0 12.0 12.0 12.0 12.0 13.0 13.0 13.0 13.0 >>>>>>>>>>> 13.0 13.0 >>>>>>>>>>> 13.0 13.0 13.0 13.0 14.0 14.0 14.0 14.0 >>>>>>>>>>> 14.0 14.0 14.0 14.0 15.0 15.0 15.0 15.0 15.0 >>>>>>>>>>> 15.0 15.0 15.0 15.0 15.0 15.0 15.0 16.0 16.0 >>>>>>>>>>> 16.0 16.0 >>>>>>>>>>> 16.0 16.0 16.0 16.0 17.0 17.0 17.0 17.0 >>>>>>>>>>> 17.0 17.0 17.0 17.0 17.0 17.0 18.0 18.0 18.0 >>>>>>>>>>> 18.0 18.0 18.0 18.0 18.0 18.0 18.0 18.0 19.0 >>>>>>>>>>> 19.0 19.0 >>>>>>>>>>> 19.0 19.0 19.0 19.0 19.0 19.0 19.0 19.0 >>>>>>>>>>> 19.0 19.0 19.0 2.0 2.0 2.0 2.0 2.0 2.0 >>>>>>>>>>> 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 >>>>>>>>>>> 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 >>>>>>>>>>> 5.0 5.0 >>>>>>>>>>> 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 >>>>>>>>>>> 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 >>>>>>>>>>> 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 >>>>>>>>>>> 7.0 7.0 >>>>>>>>>>> 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 >>>>>>>>>>> 3.0 3.0 >>>>>>>>>>> 3.0 3.0 3.0 3.0 >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Thushan Ganegedara >>>>>>>>>>> School of IT >>>>>>>>>>> University of Sydney, Australia >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Thanks & regards, >>>>>>>>>> Nirmal >>>>>>>>>> >>>>>>>>>> Team Lead - WSO2 Machine Learner >>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>>> Mobile: +94715779733 >>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Thushan Ganegedara >>>>>>>>> School of IT >>>>>>>>> University of Sydney, Australia >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Thanks & regards, >>>>>>>> Nirmal >>>>>>>> >>>>>>>> Team Lead - WSO2 Machine Learner >>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>> Mobile: +94715779733 >>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Regards, >>>>>>> >>>>>>> Thushan Ganegedara >>>>>>> School of IT >>>>>>> University of Sydney, Australia >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> >>>>>> Thushan Ganegedara >>>>>> School of IT >>>>>> University of Sydney, Australia >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Thanks & regards, >>>>> Nirmal >>>>> >>>>> Team Lead - WSO2 Machine Learner >>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>> Mobile: +94715779733 >>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Thushan Ganegedara >>>> School of IT >>>> University of Sydney, Australia >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Team Lead - WSO2 Machine Learner >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> >> >> -- >> Regards, >> >> Thushan Ganegedara >> School of IT >> University of Sydney, Australia >> > > > > -- > Regards, > > Thushan Ganegedara > School of IT > University of Sydney, Australia > -- Thanks & regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev