[ https://issues.apache.org/jira/browse/SPARK-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244088#comment-15244088 ]
Sean Owen commented on SPARK-13944: ----------------------------------- Here you're referring to *MLlib's local linear algebra library*, and right now there isn't really one. It's just a wrapper on Breeze; as you say it exists to shield that implementation detail, but, its role has never been more. I thought that was by design. If you want a bunch of functionality, call toBreeze (er, toArray and make a Breeze vector) and go for it. I think the question Nick and I still have is, has this stance really changed and why? Yes, you're supposed to use whatever linear algebra library you like in your app. What Spark needs is separable from what an app wants to use -- again, by design no? I don't see a lot of value in making a much more extensive wrapper layer just for the benefit of applications, who have no need of a wrapper layer to begin with (whereas Spark does). What's the problem with PMML? I can list a few, but none of them compare with the downside of implementing a new serialization that nothing else can read. We already have a custom serialization option in MLlib anyway. JPMML is Apache-licensed; what's hard about importing PMML? Parquet, as a columnar format, isn't sensible for models; JSON is, but again you'd be making up your own format on top of JSON like PMML builds on XML. OpenScoring is probably the right option for serving, but it's not Apache-licensed. However, that is not a problem for apps. If the goal is letting apps score models, isn't that already solved? the argument seems to be: because we're not using a standard format we have to implement custom scoring. Isn't that a bug rather than feature? > Separate out local linear algebra as a standalone module without Spark > dependency > --------------------------------------------------------------------------------- > > Key: SPARK-13944 > URL: https://issues.apache.org/jira/browse/SPARK-13944 > Project: Spark > Issue Type: New Feature > Components: Build, ML > Affects Versions: 2.0.0 > Reporter: Xiangrui Meng > Assignee: DB Tsai > Priority: Blocker > > Separate out linear algebra as a standalone module without Spark dependency > to simplify production deployment. We can call the new module > spark-mllib-local, which might contain local models in the future. > The major issue is to remove dependencies on user-defined types. > The package name will be changed from mllib to ml. For example, Vector will > be changed from `org.apache.spark.mllib.linalg.Vector` to > `org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML > pipeline will be the one in ML package; however, the existing mllib code will > not be touched. As a result, this will potentially break the API. Also, when > the vector is loaded from mllib vector by Spark SQL, the vector will > automatically converted into the one in ml package. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org