Joseph K. Bradley created SPARK-7412:
----------------------------------------

             Summary: Designing distributed prediction model abstractions for 
spark.ml
                 Key: SPARK-7412
                 URL: https://issues.apache.org/jira/browse/SPARK-7412
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
            Reporter: Joseph K. Bradley


The Pipelines API (spark.ml package) now includes abstractions for single-label 
prediction: Predictor, Classifier, Regressor.  These assume models are local, 
where single-Row prediction methods can be used as UDFs.  We need to think 
about how to support distributed models in these abstractions.

Should the abstractions be modified somehow?  Or should there be parallel (or 
inheriting) abstractions, or a mix-in?

Motivation: We may start supporting distributed models since linear models,  
random forests, and other models can get large enough to merit distributed 
storage and computation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to