Hi, I'd recommend starting by checking out the existing helper functionality for these tasks. There are helper methods to do K-fold cross-validation in MLUtils: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
The experimental spark.ml API in the Spark 1.2 release (in branch-1.2 and master) has a CrossValidator class which does this more automatically: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala There are also a few evaluation metrics implemented: https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation There definitely could be more metrics and/or better APIs to make it easier to evaluate models on RDDs. If you spot such cases, I'd recommend opening up JIRAs for the new features or improvements to get some feedback before sending PRs: https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark Hope this helps & looking forward to the contributions! Joseph On Thu, Dec 11, 2014 at 4:41 AM, kidynamit <paul.mwanj...@gmail.com> wrote: > Hi, > > I would like to contribute to Spark's Machine Learning library by adding > evaluation metrics that would be used to gauge the accuracy of a model > given > a certain features' set. In particular, I seek to contribute the k-fold > validation metrics, f-beta metric among others on top of the current MLlib > framework available. > > Please assist in steps I could take to contribute in this manner. > > Regards, > kidynamit > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Evaluation-Metrics-for-Spark-s-MLlib-tp9727.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >