Hi, In order to evaluate the ML classification accuracy, I am zipping up the prediction and test labels as follows and then comparing the pairs in predictionAndLabel:
val prediction = model.predict(test.map(_.features)) val predictionAndLabel = prediction.zip(test.map(_.label)) However, I am finding that predictionAndLabel.count() has fewer elements than test.count(). For example, my test vector has 43 elements, but predictionAndLabel has only 38 pairs. I have tried other samples and always get fewer elements after zipping. Does zipping the two vectors cause any compression? or is this because of the distributed nature of the algorithm (I am running it in local mode on a single machine). In order to get the correct accuracy, I need the above comparison to be done by a single node on the entire test data (my data is quite small). How can I ensure that? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822.html Sent from the Apache Spark User List mailing list archive at Nabble.com.