Re: [Dev] [ML] Progress with Deeplearning Component
Great! Please create a Jira. On Tue, Aug 11, 2015 at 12:43 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi CD, No worries. On Tue, Aug 11, 2015 at 5:11 PM, CD Athuraliya chathur...@wso2.com wrote: Hi Nirmal, We will be able to fix this issue. Thanks Thushan for pointing this out! :) On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com wrote: @CD, is this something we could fix? can we list features in the order of the indices? On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I noticed that, in certain cases, the features don't follow the correct ordering. Any idea why this is happening? For example in this image, V10 appears after V1 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 94716288847 LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter https://twitter.com/cdathuraliya | Blog http://cdathuraliya.tumblr.com/ -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Progress with Deeplearning Component
Hi, I noticed that, in certain cases, the features don't follow the correct ordering. Any idea why this is happening? For example in this image, V10 appears after V1 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Progress with Deeplearning Component
Hi Nirmal, We will be able to fix this issue. Thanks Thushan for pointing this out! :) On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com wrote: @CD, is this something we could fix? can we list features in the order of the indices? On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I noticed that, in certain cases, the features don't follow the correct ordering. Any idea why this is happening? For example in this image, V10 appears after V1 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 94716288847 LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter https://twitter.com/cdathuraliya | Blog http://cdathuraliya.tumblr.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Progress with Deeplearning Component
@CD, is this something we could fix? can we list features in the order of the indices? On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I noticed that, in certain cases, the features don't follow the correct ordering. Any idea why this is happening? For example in this image, V10 appears after V1 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Progress with Deeplearning Component
Hi CD, No worries. On Tue, Aug 11, 2015 at 5:11 PM, CD Athuraliya chathur...@wso2.com wrote: Hi Nirmal, We will be able to fix this issue. Thanks Thushan for pointing this out! :) On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com wrote: @CD, is this something we could fix? can we list features in the order of the indices? On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi, I noticed that, in certain cases, the features don't follow the correct ordering. Any idea why this is happening? For example in this image, V10 appears after V1 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Thanks regards, Nirmal Team Lead - WSO2 Machine Learner Associate Technical Lead - Data Technologies Team, WSO2 Inc. Mobile: +94715779733 Blog: http://nirmalfdo.blogspot.com/ -- *CD Athuraliya* Software Engineer WSO2, Inc. lean . enterprise . middleware Mobile: +94 716288847 94716288847 LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter https://twitter.com/cdathuraliya | Blog http://cdathuraliya.tumblr.com/ -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
Re: [Dev] [ML] Progress with Deeplearning Component
Hi all, After a daunting struggle, I was able to corner the issue with the poor accuracy for the specific leaf dataset. The dataset has classes from 1 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as 1,2,..,14,15,23,24,...,35,36 Then, while converting these class labels to enums in H-2-O (combined with the fact that there's very little data for each class) confuses H-2-O and causes it to *assign different enum values for the same classes in different datasets*. Which manifest itself as a poor accuracy. I suspect that there's a mismatch between the labels provided by JavaRDD and enums produced by H-2-O as well. I'm looking into this issue right now. Thank you On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com wrote: Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev
[Dev] [ML] Progress with Deeplearning Component
Hi all, I've been testing the new Deeplearning component with few different datasets (mainly leaf dataset) and the leaf dataset seems to be not working as expected for an unknown reason. However, I tested the Deeplearning component extensively with the leaf dataset and identified several potential problems that might be causing the poor accuracy. 1. Need to have higher number of epochs (compared to other datasets) to produce a reasonable accuracy. 2. Too many neurons causing overfitting thereby causing poor accuracy. 3. Some classes have quite closely related features (Especially the latter classes are misclassified often) I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which is quite reasonable. But I'm having trouble reaching that accuracy with Deeplearning (which should be quite easy). Highest accuracy I reached so far is 71.xx% So I'm still looking for any definite issues causing the poor accuracy. Thank you. -- Regards, Thushan Ganegedara School of IT University of Sydney, Australia ___ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev