[ 
https://issues.apache.org/jira/browse/SPARK-13944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244088#comment-15244088
 ] 

Sean Owen commented on SPARK-13944:
-----------------------------------

Here you're referring to *MLlib's local linear algebra library*, and right now 
there isn't really one. It's just a wrapper on Breeze; as you say it exists to 
shield that implementation detail, but, its role has never been more. I thought 
that was by design. If you want a bunch of functionality, call toBreeze (er, 
toArray and make a Breeze vector) and go for it. I think the question Nick and 
I still have is, has this stance really changed and why? Yes, you're supposed 
to use whatever linear algebra library you like in your app.

What Spark needs is separable from what an app wants to use -- again, by design 
no? I don't see a lot of value in making a much more extensive wrapper layer 
just for the benefit of applications, who have no need of a wrapper layer to 
begin with (whereas Spark does).

What's the problem with PMML? I can list a few, but none of them compare with 
the downside of implementing a new serialization that nothing else can read. We 
already have a custom serialization option in MLlib anyway. JPMML is 
Apache-licensed; what's hard about importing PMML? Parquet, as a columnar 
format, isn't sensible for models; JSON is, but again you'd be making up your 
own format on top of JSON like PMML builds on XML.

OpenScoring is probably the right option for serving, but it's not 
Apache-licensed. However, that is not a problem for apps. If the goal is 
letting apps score models, isn't that already solved? the argument seems to be: 
because we're not using a standard format we have to implement custom scoring. 
Isn't that a bug rather than feature?

> Separate out local linear algebra as a standalone module without Spark 
> dependency
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-13944
>                 URL: https://issues.apache.org/jira/browse/SPARK-13944
>             Project: Spark
>          Issue Type: New Feature
>          Components: Build, ML
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: DB Tsai
>            Priority: Blocker
>
> Separate out linear algebra as a standalone module without Spark dependency 
> to simplify production deployment. We can call the new module 
> spark-mllib-local, which might contain local models in the future.
> The major issue is to remove dependencies on user-defined types.
> The package name will be changed from mllib to ml. For example, Vector will 
> be changed from `org.apache.spark.mllib.linalg.Vector` to 
> `org.apache.spark.ml.linalg.Vector`. The return vector type in the new ML 
> pipeline will be the one in ML package; however, the existing mllib code will 
> not be touched. As a result, this will potentially break the API. Also, when 
> the vector is loaded from mllib vector by Spark SQL, the vector will 
> automatically converted into the one in ml package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to