[ https://issues.apache.org/jira/browse/SPARK-16365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398178#comment-15398178 ]
Joseph K. Bradley commented on SPARK-16365: ------------------------------------------- This JIRA is covering multiple potential projects within mllib-local. I'd find it useful to separate these. Here are the ones I see so far: h3. Local model implementations IMO this is a no brainer: we should definitely work towards moving model implementations into mllib-local. This is _separate_ from model serving, etc. but is an important dependency of non-Spark model serving efforts. Essentially, what is required now is: * duplicate transformation and prediction functionality outside of Spark * build model serving infra based on the duplicate code path What we should have is: * local model implementation in mllib-local * MLlib depends on that implementation * external model serving infra depends on mllib-local The key benefit is utilizing exactly the same code paths for prediction within MLlib and in production systems, making it easy to keep them in sync if MLlib changes its behavior. CCing [~chromaticbum] with whom I had a great discussion on this. h3. Local model serialization format This is easily conflated with model implementations, but I believe it should be a separate discussion. The community did a great job on ML persistence for the DataFrame-based API, so I do believe it is possible to do a good job here, though we must achieve consensus first. This would of course need to happen after the local model implementations were built. h3. Local linear algebra This is already being discussed elsewhere, e.g. [SPARK-6442], so let's not discuss it here. h3. Local model training I completely agree with what was said above about (a) not duplicating functionality in other single-machine libraries and (b) doing all training with Spark proper. > Ideas for moving "mllib-local" forward > -------------------------------------- > > Key: SPARK-16365 > URL: https://issues.apache.org/jira/browse/SPARK-16365 > Project: Spark > Issue Type: Brainstorming > Components: ML > Reporter: Nick Pentreath > > Since SPARK-13944 is all done, we should all think about what the "next > steps" might be for {{mllib-local}}. E.g., it could be "improve Spark's > linear algebra", or "investigate how we will implement local models/pipelines > in Spark", etc. > This ticket is for comments, ideas, brainstormings and PoCs. The separation > of linalg into a standalone project turned out to be significantly more > complex than originally expected. So I vote we devote sufficient discussion > and time to planning out the next move :) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org