[ 
https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398178#comment-15398178
 ] 

Joseph K. Bradley commented on SPARK-16365:
-------------------------------------------

This JIRA is covering multiple potential projects within mllib-local.  I'd find 
it useful to separate these.  Here are the ones I see so far:

h3. Local model implementations

IMO this is a no brainer: we should definitely work towards moving model 
implementations into mllib-local.  This is _separate_ from model serving, etc. 
but is an important dependency of non-Spark model serving efforts.  
Essentially, what is required now is:
* duplicate transformation and prediction functionality outside of Spark
* build model serving infra based on the duplicate code path

What we should have is:
* local model implementation in mllib-local
* MLlib depends on that implementation
* external model serving infra depends on mllib-local

The key benefit is utilizing exactly the same code paths for prediction within 
MLlib and in production systems, making it easy to keep them in sync if MLlib 
changes its behavior.

CCing [~chromaticbum] with whom I had a great discussion on this.

h3. Local model serialization format

This is easily conflated with model implementations, but I believe it should be 
a separate discussion.  The community did a great job on ML persistence for the 
DataFrame-based API, so I do believe it is possible to do a good job here, 
though we must achieve consensus first.  This would of course need to happen 
after the local model implementations were built.

h3. Local linear algebra

This is already being discussed elsewhere, e.g. [SPARK-6442], so let's not 
discuss it here.

h3. Local model training

I completely agree with what was said above about (a) not duplicating 
functionality in other single-machine libraries and (b) doing all training with 
Spark proper.

> Ideas for moving "mllib-local" forward
> --------------------------------------
>
>                 Key: SPARK-16365
>                 URL: https://issues.apache.org/jira/browse/SPARK-16365
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Nick Pentreath
>
> Since SPARK-13944 is all done, we should all think about what the "next 
> steps" might be for {{mllib-local}}. E.g., it could be "improve Spark's 
> linear algebra", or "investigate how we will implement local models/pipelines 
> in Spark", etc.
> This ticket is for comments, ideas, brainstormings and PoCs. The separation 
> of linalg into a standalone project turned out to be significantly more 
> complex than originally expected. So I vote we devote sufficient discussion 
> and time to planning out the next move :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to