Overall, this is a good list of items that should be worked on, particularly because it contains several user-facing items. However, to echo what Luciano said, I'm also concerned about the timeline. At this stage, I agree that we need to release more often, and with a more user-oriented "product" focus as a guide for timelines. I.e. we should orient our release timelines around items that focus on the "product" of allowing the user to work on a wide range of ML problems in a simple and easy manner on top of Spark.
With that in mind, I agree that a focus on a subset of (1) and (2) would be good for an immediate release, with a particular focus on Spark 2.0 support as a priority. How about we aim for a February 1st release date for the initial items? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Hi Matthias, > > Thanks for the detailed roadmap. > > +1 for all the items with few modifications. > > 1) APIs and Language: > * Cleanup new MLContext (matrix/frame data types, move tests, etc) > >> Ensure Python and Scala MLContext have same API capability. > > * Remove old MLContext > * Consolidate MLContext and JMLC > * Full support for Scala/Python DSLs > >> +1 for Python DSL except for push-down of loop structures and functions. > > * Remove old file-based transform > * Scala/Python wrappers for all existing algorithms > * Data converters (additional formats: e.g., libsvm; performance) > > 2) Updated Dependencies: > * Spark 2.0 support > * Matrix block library (isolated jar) > > 3) Compiler/Runtime Features: > * GPU support (full compiler and runtime support) > >> Can we break this down into phases: > >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the > >> timeline of the phases in the JIRA. > > * Compressed linear algebra v2 > * Code generation (automatic operator fusion) > * Extended parfor (full spark exploitation, micro-batch support) > * Scale-up architecture (large dense blocks, numa)? > > 4) Tools > * Extended stats (task locality, shuffle, etc) > * Cloud resource advisor (extended resource optimizer)? > > 5) Algorithms > * Graduate "staging" algorithms (robustness/performance) > * Perftest: include all algorithms into automated performance tests > >> via spark-submit + via Scala/Python wrappers > > * Simplify usage decision trees, random forest, mlogreg, msvm > (preprocessing, label representation, etc) > >> + command-line variable naming. For example: maxi, maxiter, etc. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) > can be done incrementally. For (5), some of the changes might also > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 01/03/2017 02:44 PM > Subject: Re: [DISCUSS] Roadmap SystemML 1.0 > > > > > Yes indeed, most of (3) and (4) can be done incrementally. For (5), some > of the changes might also modify the signature of algorithms (i.e., > parameters and required input data) but it would help, for example with > decision trees, as users no longer need to dummy code their inputs. > > Generally, I'm fine with making (3), (4), and part of (5) optional and > let the "must-have" features from (1) and (2) determine the timeline. > > Regards, > Matthias > > On 1/3/2017 11:27 PM, Luciano Resende wrote: > > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mboe...@googlemail.com> > > wrote: > > > >> I'd like to initiate the discussion of a concrete roadmap for our next > >> release. According, to previous discussions, I'd think it's fair to say > >> that we agree on calling it SystemML 1.0. We should carefully plan this > >> release as it's an opportunity to change APIs and remove some older > >> deprecated features. I'd like to encourage not just developers but also the > >> broader community to participate in this discussion. > >> > >> Personally, I think a target date of Q2/2017 is realistic. Let's start > >> with collecting the major features and changes that potentially affect > >> users. Here is an initial list, but please feel free to add and up- or > >> down-vote the individual items. > >> > >> 1) APIs and Language: > >> * Cleanup new MLContext (matrix/frame data types, move tests, etc) > >> * Remove old MLContext > >> * Consolidate MLContext and JMLC > >> * Full support for Scala/Python DSLs > >> * Remove old file-based transform > >> * Scala/Python wrappers for all existing algorithms > >> * Data converters (additional formats: e.g., libsvm; performance) > >> > >> 2) Updated Dependencies: > >> * Spark 2.0 support > >> * Matrix block library (isolated jar) > >> > >> 3) Compiler/Runtime Features: > >> * GPU support (full compiler and runtime support) > >> * Compressed linear algebra v2 > >> * Code generation (automatic operator fusion) > >> * Extended parfor (full spark exploitation, micro-batch support) > >> * Scale-up architecture (large dense blocks, numa)? > >> > >> 4) Tools > >> * Extended stats (task locality, shuffle, etc) > >> * Cloud resource advisor (extended resource optimizer)? > >> > >> 5) Algorithms > >> * Graduate "staging" algorithms (robustness/performance) > >> * Perftest: include all algorithms into automated performance tests > >> * Simplify usage decision trees, random forest, mlogreg, msvm > >> (preprocessing, label representation, etc) > >> > >> Items marked with a ? can potentially be moved out to subsequent releases. > >> > >> > >> Regards, > >> Matthias > >> > > > > My understanding is that most of the items in 1 and 2 are going to break > > backward compatibility, while the others can be done incrementally. Is this > > assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and > > them, continue with 3, 4, 5, etc ? as I don't think we should wait for > > 2017/Q2 to do a 1.0 release. I believe in release early, release often, > > particularly to attract new users, that can help verifying and contributing > > to specific releases. > > > > Thoughts ? > > > > > >