Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Hi, Yes, no mater which approach used, there's always going to be outliers which does not fit the defined rules. But for these corner cases, user always have to opportunity to change the variable to numerical. One more approach is to introduce a measure of replication of values in a column. If the column shows a repetition of same values many times, imo, it is a good indicator for detecting categorical variable. On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote: On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi, This was mainly due to the detection of a numerical feature as a categorical one. Oh, it makes sense now. Why don't we try taking a sample of data and if the sample contains only integers (or doubles without any decimals) or strings, consider it as a categorical variable. I tried that approach too, but there're some datasets like automobile dataset normalized-losses feature, which has integer values (0-164) but which is probably not categorical. We suggested increasing the categorical threshold as a work-around. @thushan did it work? Yes, it worked. After increasing the threshold to 40. On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote: This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.0 12.012.012.012.012.013.013.013.013.0 13.013.0 13.013.013.013.014.014.014.014.0 14.014.014.014.015.015.015.015.015.0 15.015.015.015.015.015.015.016.016.0 16.016.0 16.016.016.016.017.017.017.017.0 17.017.017.017.017.017.018.018.018.0 18.018.018.018.018.018.018.018.019.0 19.019.0 19.019.019.019.019.019.019.019.0 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Hi, This was mainly due to the detection of a numerical feature as a categorical one. Oh, it makes sense now. Why don't we try taking a sample of data and if the sample contains only integers (or doubles without any decimals) or strings, consider it as a categorical variable. We suggested increasing the categorical threshold as a work-around. @thushan did it work? Yes, it worked. After increasing the threshold to 40. On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote: This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi, This was mainly due to the detection of a numerical feature as a categorical one. Oh, it makes sense now. Why don't we try taking a sample of data and if the sample contains only integers (or doubles without any decimals) or strings, consider it as a categorical variable. I tried that approach too, but there're some datasets like automobile dataset normalized-losses feature, which has integer values (0-164) but which is probably not categorical. We suggested increasing the categorical threshold as a work-around. @thushan did it work? Yes, it worked. After increasing the threshold to 40. On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote: This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.0 13.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.0 16.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.0 19.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.0 12.012.012.012.012.013.013.013.013.0 13.013.0 13.013.013.013.014.014.014.014.0 14.014.014.014.015.015.015.015.015.0 15.015.015.015.015.015.015.016.016.0 16.016.0 16.016.016.016.017.017.017.017.0 17.017.017.017.0
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Moreover, I think a hybrid approach as follows might work well. 1. Select a sample 2. Filter columns by the data type and find potential categorical variables (integer / string) 3. Filter further by checking if same values are repeated multiple times in the dataset. On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, Yes, no mater which approach used, there's always going to be outliers which does not fit the defined rules. But for these corner cases, user always have to opportunity to change the variable to numerical. One more approach is to introduce a measure of replication of values in a column. If the column shows a repetition of same values many times, imo, it is a good indicator for detecting categorical variable. On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote: On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi, This was mainly due to the detection of a numerical feature as a categorical one. Oh, it makes sense now. Why don't we try taking a sample of data and if the sample contains only integers (or doubles without any decimals) or strings, consider it as a categorical variable. I tried that approach too, but there're some datasets like automobile dataset normalized-losses feature, which has integer values (0-164) but which is probably not categorical. We suggested increasing the categorical threshold as a work-around. @thushan did it work? Yes, it worked. After increasing the threshold to 40. On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote: This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.0 12.012.012.012.012.013.013.013.013.0 13.013.0 13.013.013.013.014.014.014.014.0 14.014.014.014.015.015.015.015.015.0 15.015.015.015.015.015.015.016.016.0 16.016.0 16.016.016.016.017.017.017.017.0 17.017.017.017.017.017.018.018.018.0 18.018.018.018.018.018.018.018.019.0 19.019.0 19.019.019.019.019.019.019.019.0 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Thushan, please send your suggestions to the other thread :) On Fri, Aug 14, 2015 at 10:22 AM, Thushan Ganegedara thu...@gmail.com wrote: Moreover, I think a hybrid approach as follows might work well. 1. Select a sample 2. Filter columns by the data type and find potential categorical variables (integer / string) 3. Filter further by checking if same values are repeated multiple times in the dataset. On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, Yes, no mater which approach used, there's always going to be outliers which does not fit the defined rules. But for these corner cases, user always have to opportunity to change the variable to numerical. One more approach is to introduce a measure of replication of values in a column. If the column shows a repetition of same values many times, imo, it is a good indicator for detecting categorical variable. On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote: On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi, This was mainly due to the detection of a numerical feature as a categorical one. Oh, it makes sense now. Why don't we try taking a sample of data and if the sample contains only integers (or doubles without any decimals) or strings, consider it as a categorical variable. I tried that approach too, but there're some datasets like automobile dataset normalized-losses feature, which has integer values (0-164) but which is probably not categorical. We suggested increasing the categorical threshold as a work-around. @thushan did it work? Yes, it worked. After increasing the threshold to 40. On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote: This was mainly due to the detection of a numerical feature as a categorical one. We suggested increasing the categorical threshold as a work-around. @thushan did it work? On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com wrote: This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.0 12.012.012.012.012.013.013.013.013.0 13.013.0 13.013.013.013.014.014.014.014.0 14.014.014.014.015.015.015.015.015.0 15.015.015.015.015.015.015.016.016.0 16.016.0 16.016.016.016.017.017.017.017.0 17.017.017.017.017.017.018.018.018.0 18.018.018.018.018.018.018.018.019.0 19.019.0 19.019.019.019.019.019.019.019.0 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal
[Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia 0.72694,1.4742,0.32396,0.98535,1,0.83592,0.0046566,0.0039465,0.04779,0.12795,0.016108,0.0052323,0.00027477,1.1756,1 0.74173,1.5257,0.36116,0.98152,0.99825,0.79867,0.0052423,0.0050016,0.02416,0.090476,0.0081195,0.002708,7.48E-05,0.69659,1 0.76722,1.5725,0.38998,0.97755,1,0.80812,0.0074573,0.010121,0.011897,0.057445,0.0032891,0.00092068,3.79E-05,0.44348,1 0.73797,1.4597,0.35376,0.97566,1,0.81697,0.0068768,0.0086068,0.01595,0.065491,0.0042707,0.0011544,6.63E-05,0.58785,1 0.82301,1.7707,0.44462,0.97698,1,0.75493,0.007428,0.010042,0.0079379,0.045339,0.0020514,0.00055986,2.35E-05,0.34214,1 0.72997,1.4892,0.34284,0.98755,1,0.84482,0.0049451,0.0044506,0.010487,0.058528,0.0034138,0.0011248,2.48E-05,0.34068,1 0.82063,1.7529,0.44458,0.97964,0.99649,0.7677,0.0059279,0.0063954,0.018375,0.080587,0.0064523,0.0022713,4.15E-05,0.53904,1 0.77982,1.6215,0.39222,0.98512,0.99825,0.80816,0.0050987,0.0047314,0.024875,0.089686,0.0079794,0.0024664,0.00014676,0.66975,1 0.83089,1.8199,0.45693,0.9824,1,0.77106,0.0060055,0.006564,0.0072447,0.040616,0.0016469,0.00038812,3.29E-05,0.33696,1 0.90631,2.3906,0.58336,0.97683,0.99825,0.66419,0.0084019,0.012848,0.0070096,0.042347,0.0017901,0.00045889,2.83E-05,0.28082,1 0.7459,1.4927,0.34116,0.98296,1,0.83088,0.0055665,0.0056395,0.0057679,0.036511,0.0013313,0.00030872,3.18E-05,0.25026,1 0.79606,1.6934,0.43387,0.98181,1,0.76985,0.0077992,0.011071,0.013677,0.057832,0.004,0.00081648,0.00013855,0.49751,1 0.93361,2.7582,0.64257,0.98346,1,0.59851,0.0055336,0.0055731,0.029712,0.089889,0.0080153,0.0020648,0.00023883,0.91499,2 0.91186,2.4994,0.60323,0.983,1,0.64916,0.0061494,0.0068823,0.018887,0.072486,0.0052267,0.0014887,8.33E-05,0.67811,2 0.89063,2.2927,0.56667,0.98732,1,0.66427,0.0028365,0.0014643,0.029272,0.091328,0.0082717,0.0022383,0.00020166,0.87177,2 0.86755,2.009,0.51464,0.98691,1,0.70277,0.0054439,0.0053937,0.030348,0.092063,0.0084044,0.0022541,0.00019854,0.94545,2
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.0 13.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.0 16.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.0 19.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)
This issue occurs, if I turn the response variable to a categorical variable. If I get the variable as a numerical variable, the values are read correctly. So I presume there is a fault in categorical conversion of the variable. On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com wrote: I still get the same result 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.012.0 12.012.012.012.013.013.013.013.013.013.0 13.013.013.013.014.014.014.014.014.0 14.014.014.015.015.015.015.015.015.0 15.015.015.015.015.015.016.016.016.016.0 16.016.016.016.017.017.017.017.017.0 17.017.017.017.017.018.018.018.018.0 18.018.018.018.018.018.018.019.019.019.0 19.019.019.019.019.019.019.019.019.0 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote: Can you use following code and try; ListLabeledPoint points = labeledPoints.collect(); for(int i=0;ipoints.size();i++){ System.out.print(points.get(i).label() + \t); } On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com wrote: I used the following snippet for(int i=0;ilabeledPoints.collect().size();i++){ System.out.print(labeledPoints.collect().get(i).label() + \t); } in the public MLModel build() throws MLModelBuilderException in DeeplearningModelBuilder.java On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote: Hi thushan, We need more info. What did you exactly print and where? On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I found the potential cause of the poor accuracy for the leaf dataset. It seems the data read into ML is wrong. I have attached the data file as a CSV (classes are in the last column) However, when I print out the labels of the read data (classes), it looks something like below. Clearly there aren't this many 3.0 classes and there should be classes up to 36.0. Is this caused by a bug? 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 12.012.012.012.012.0 12.012.012.012.012.013.013.013.013.0 13.013.0 13.013.013.013.014.014.014.014.0 14.014.014.014.015.015.015.015.015.0 15.015.015.015.015.015.015.016.016.0 16.016.0 16.016.016.016.017.017.017.017.0 17.017.017.017.017.017.018.018.018.0 18.018.018.018.018.018.018.018.019.0 19.019.0 19.019.019.019.019.019.019.019.0 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0