Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-06 Thread DB Tsai
+1 for renaming the jar file. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly wrote: > perhaps renaming to Spark ML would actually clear up code and

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Nick Pentreath
+1 for this proposal - as you mention I think it's the defacto current situation anyway. Note that from a developer view it's just the user-facing API that will be only "ml" - the majority of the actual algorithms still operate on RDDs under the good currently. On Wed, 6 Apr 2016 at 05:03, Chris

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Chris Fregly
perhaps renaming to Spark ML would actually clear up code and documentation confusion? +1 for rename > On Apr 5, 2016, at 7:00 PM, Reynold Xin wrote: > > +1 > > This is a no brainer IMO. > > >> On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Reynold Xin
+1 This is a no brainer IMO. On Tue, Apr 5, 2016 at 7:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Holden Karau
I'm very much in favor of this, the less porting work there is the better :) On Tue, Apr 5, 2016 at 5:32 PM, Joseph Bradley wrote: > +1 By the way, the JIRA for tracking (Scala) API parity is: > https://issues.apache.org/jira/browse/SPARK-4591 > > On Tue, Apr 5, 2016 at

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Joseph Bradley
+1 By the way, the JIRA for tracking (Scala) API parity is: https://issues.apache.org/jira/browse/SPARK-4591 On Tue, Apr 5, 2016 at 4:58 PM, Matei Zaharia wrote: > This sounds good to me as well. The one thing we should pay attention to > is how we update the docs so

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Matei Zaharia
This sounds good to me as well. The one thing we should pay attention to is how we update the docs so that people know to start with the spark.ml classes. Right now the docs list spark.mllib first and also seem more comprehensive in that area than in spark.ml, so maybe people naturally move

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
Yes, DB (cc'ed) is working on porting the local linear algebra library over (SPARK-13944). There are also frequent pattern mining algorithms we need to port over in order to reach feature parity. -Xiangrui On Tue, Apr 5, 2016 at 12:08 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote:

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Shivaram Venkataraman
Overall this sounds good to me. One question I have is that in addition to the ML algorithms we have a number of linear algebra (various distributed matrices) and statistical methods in the spark.mllib package. Is the plan to port or move these to the spark.ml namespace in the 2.x series ? Thanks

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Sean Owen
FWIW, all of that sounds like a good plan to me. Developing one API is certainly better than two. On Tue, Apr 5, 2016 at 7:01 PM, Xiangrui Meng wrote: > Hi all, > > More than a year ago, in Spark 1.2 we introduced the ML pipeline API built > on top of Spark SQL’s DataFrames.

Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-05 Thread Xiangrui Meng
Hi all, More than a year ago, in Spark 1.2 we introduced the ML pipeline API built on top of Spark SQL’s DataFrames. Since then the new DataFrame-based API has been developed under the spark.ml package, while the old RDD-based API has been developed in parallel under the spark.mllib package.