Re: evaluating classification accuracy
I am using 1.0.1 and I am running locally (I am not providing any master URL). But the zip() does not produce the correct count as I mentioned above. So not sure if the issue has been fixed in 1.0.1. However, instead of using zip, I am now using the code that Sean has mentioned and am getting the correct count. So the issue is resolved. thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822p10980.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
evaluating classification accuracy
Hi, In order to evaluate the ML classification accuracy, I am zipping up the prediction and test labels as follows and then comparing the pairs in predictionAndLabel: val prediction = model.predict(test.map(_.features)) val predictionAndLabel = prediction.zip(test.map(_.label)) However, I am finding that predictionAndLabel.count() has fewer elements than test.count(). For example, my test vector has 43 elements, but predictionAndLabel has only 38 pairs. I have tried other samples and always get fewer elements after zipping. Does zipping the two vectors cause any compression? or is this because of the distributed nature of the algorithm (I am running it in local mode on a single machine). In order to get the correct accuracy, I need the above comparison to be done by a single node on the entire test data (my data is quite small). How can I ensure that? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: evaluating classification accuracy
Are you using 1.0.0? There was a bug, which was fixed in 1.0.1 and master. If you don't want to switch to 1.0.1 or master, try to cache and count test first. -Xiangrui On Mon, Jul 28, 2014 at 6:07 PM, SK skrishna...@gmail.com wrote: Hi, In order to evaluate the ML classification accuracy, I am zipping up the prediction and test labels as follows and then comparing the pairs in predictionAndLabel: val prediction = model.predict(test.map(_.features)) val predictionAndLabel = prediction.zip(test.map(_.label)) However, I am finding that predictionAndLabel.count() has fewer elements than test.count(). For example, my test vector has 43 elements, but predictionAndLabel has only 38 pairs. I have tried other samples and always get fewer elements after zipping. Does zipping the two vectors cause any compression? or is this because of the distributed nature of the algorithm (I am running it in local mode on a single machine). In order to get the correct accuracy, I need the above comparison to be done by a single node on the entire test data (my data is quite small). How can I ensure that? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822.html Sent from the Apache Spark User List mailing list archive at Nabble.com.