[ 
https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923406#comment-15923406
 ] 

Asher Krim commented on SPARK-16365:
------------------------------------

Thanks for pointing me to this Jira [~josephkb], I somehow missed it!

I recently posted about this exact issue on the dev@ list. I wrote up a 
[document|https://docs.google.com/document/d/1Ha4DRMio5A7LjPqiHUnwVzbaxbev6ys04myyz6nDgI4/edit?usp=sharing]
 with some details about my views. The tl;dr is that this is, in my mind, the 
single most important feature currently missing from the Pipeline API. Training 
using Spark is all nice and good, but if I can't deploy the models without 
relying on Spark, then using Spark for production systems becomes much less 
attractive. The work currently required to make Pipeline models work in 
production without Spark is just not worth it, since it requires both 
re-implementing the algorithms as well as rigorous testing (which is required 
to avoid small skews due to possible differences in implementations). The 
maintenance of this can easily become nightmarish.

PMML and other export schemes are not a solution to this problem - they are 
nearly orthogonal, since they only describe WHAT to do, not exactly HOW to do 
it. 

I'm mostly just rephrasing [~josephkb], who captured this perfectly under 
"Local model implementations" above.

I am very eager to see this implemented in Spark, and am happy to start 
contributing code. [~hollinwilkins] has already done a lot of work on this for 
MLeap, so I'm hoping he would be on board as well. Other than a few small 
upfront design decisions, I think the implementation is mostly "Embarrassingly 
Parallelâ„¢"

> Ideas for moving "mllib-local" forward
> --------------------------------------
>
>                 Key: SPARK-16365
>                 URL: https://issues.apache.org/jira/browse/SPARK-16365
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Nick Pentreath
>
> Since SPARK-13944 is all done, we should all think about what the "next 
> steps" might be for {{mllib-local}}. E.g., it could be "improve Spark's 
> linear algebra", or "investigate how we will implement local models/pipelines 
> in Spark", etc.
> This ticket is for comments, ideas, brainstormings and PoCs. The separation 
> of linalg into a standalone project turned out to be significantly more 
> complex than originally expected. So I vote we devote sufficient discussion 
> and time to planning out the next move :)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to