Hi,

In order to evaluate the ML classification accuracy, I am zipping up the
prediction and test labels as follows and then comparing the pairs in
predictionAndLabel:

val prediction = model.predict(test.map(_.features))
val predictionAndLabel = prediction.zip(test.map(_.label))


However, I am finding that predictionAndLabel.count() has fewer elements
than test.count().  For example, my test vector has 43 elements, but
predictionAndLabel has only 38 pairs. I have tried other samples and always
get fewer elements after zipping. 

Does zipping the two vectors cause any compression? or is this because of
the distributed nature of the algorithm (I am running it in local mode on a
single machine). In order to get the correct accuracy, I need the above
comparison to be done by a single node on the entire test data (my data is
quite small). How can I ensure that?

thanks 






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/evaluating-classification-accuracy-tp10822.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to