Re: Evaluation Metrics for Spark's MLlib

Joseph Bradley Thu, 11 Dec 2014 15:25:42 -0800

Hi, I'd recommend starting by checking out the existing helper
functionality for these tasks.  There are helper methods to do K-fold
cross-validation in MLUtils:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala


The experimental spark.ml API in the Spark 1.2 release (in branch-1.2 and
master) has a CrossValidator class which does this more automatically:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala

There are also a few evaluation metrics implemented:
https://github.com/apache/spark/tree/master/mllib/src/main/scala/org/apache/spark/mllib/evaluation

There definitely could be more metrics and/or better APIs to make it easier
to evaluate models on RDDs.  If you spot such cases, I'd recommend opening
up JIRAs for the new features or improvements to get some feedback before
sending PRs:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Hope this helps & looking forward to the contributions!
Joseph

On Thu, Dec 11, 2014 at 4:41 AM, kidynamit <[email protected]> wrote:

> Hi,
>
> I would like to contribute to Spark's Machine Learning library by adding
> evaluation metrics that would be used to gauge the accuracy of a model
> given
> a certain features' set. In particular, I seek to contribute the k-fold
> validation metrics, f-beta metric among others on top of the current MLlib
> framework available.
>
> Please assist in steps I could take to contribute in this manner.
>
> Regards,
> kidynamit
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Evaluation-Metrics-for-Spark-s-MLlib-tp9727.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Evaluation Metrics for Spark's MLlib

Reply via email to