Hello community,
Joseph and I would like to introduce a new Spark package that should
be useful for python users that depend on scikit-learn.

Among other tools:
 - train and evaluate multiple scikit-learn models in parallel.
 - convert Spark's Dataframes seamlessly into numpy arrays
 - (experimental) distribute Scipy's sparse matrices as a dataset of
sparse vectors.

Spark-sklearn focuses on problems that have a small amount of data and
that can be run in parallel. Note this package distributes simple
tasks like grid-search cross-validation. It does not distribute
individual learning algorithms (unlike Spark MLlib).

If you want to use it, see instructions on the package page:
https://github.com/databricks/spark-sklearn

This blog post contains more details:
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html

Let us know if you have any questions. Also, documentation or code
contributions are much welcome (Apache 2.0 license).

Cheers

Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to