Re: [DISCUSS] Roadmap SystemML 1.0

dusenberrymw Wed, 04 Jan 2017 18:02:13 -0800

Overall, this is a good list of items that should be worked on, particularly 
because it contains several user-facing items.  However, to echo what Luciano 
said, I'm also concerned about the timeline.  At this stage, I agree that we 
need to release more often, and with a more user-oriented "product" focus as a 
guide for timelines.  I.e. we should orient our release timelines around items 
that focus on the "product" of allowing the user to work on a wide range of ML 
problems in a simple and easy manner on top of Spark.


With that in mind, I agree that a focus on a subset of (1) and (2) would be 
good for an immediate release, with a particular focus on Spark 2.0 support as 
a priority.

How about we aim for a February 1st release date for the initial items?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Hi Matthias,
> 
> Thanks for the detailed roadmap. 
> 
> +1 for all the items with few modifications.
> 
> 1) APIs and Language:
> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >> Ensure Python and Scala MLContext have same API capability.
> 
> * Remove old MLContext
> * Consolidate MLContext and JMLC
> * Full support for Scala/Python DSLs
> >> +1 for Python DSL except for push-down of loop structures and functions. 
> 
> * Remove old file-based transform
> * Scala/Python wrappers for all existing algorithms
> * Data converters (additional formats: e.g., libsvm; performance)
> 
> 2) Updated Dependencies:
> * Spark 2.0 support
> * Matrix block library (isolated jar)
> 
> 3) Compiler/Runtime Features:
> * GPU support (full compiler and runtime support)
> >> Can we break this down into phases: 
> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the 
> >> timeline of the phases in the JIRA.
> 
> * Compressed linear algebra v2
> * Code generation (automatic operator fusion)
> * Extended parfor (full spark exploitation, micro-batch support)
> * Scale-up architecture (large dense blocks, numa)?
> 
> 4) Tools
> * Extended stats (task locality, shuffle, etc)
> * Cloud resource advisor (extended resource optimizer)?
> 
> 5) Algorithms
> * Graduate "staging" algorithms (robustness/performance)
> * Perftest: include all algorithms into automated performance tests
> >> via spark-submit + via Scala/Python wrappers
> 
> * Simplify usage decision trees, random forest, mlogreg, msvm 
> (preprocessing, label representation, etc)
> >> + command-line variable naming. For example: maxi, maxiter, etc.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) 
> can be done incrementally. For (5), some of the changes might also
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 01/03/2017 02:44 PM
> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
> 
> 
> 
> 
> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
> of the changes might also modify the signature of algorithms (i.e., 
> parameters and required input data) but it would help, for example with 
> decision trees, as users no longer need to dummy code their inputs.
> 
> Generally, I'm fine with making (3), (4), and part of (5) optional and 
> let the "must-have" features from (1) and (2) determine the timeline.
> 
> Regards,
> Matthias
> 
> On 1/3/2017 11:27 PM, Luciano Resende wrote:
> > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <mboe...@googlemail.com>
> > wrote:
> >
> >> I'd like to initiate the discussion of a concrete roadmap for our next
> >> release. According, to previous discussions, I'd think it's fair to say
> >> that we agree on calling it SystemML 1.0. We should carefully plan this
> >> release as it's an opportunity to change APIs and remove some older
> >> deprecated features. I'd like to encourage not just developers but also the
> >> broader community to participate in this discussion.
> >>
> >> Personally, I think a target date of Q2/2017 is realistic. Let's start
> >> with collecting the major features and changes that potentially affect
> >> users. Here is an initial list, but please feel free to add and up- or
> >> down-vote the individual items.
> >>
> >> 1) APIs and Language:
> >> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >> * Remove old MLContext
> >> * Consolidate MLContext and JMLC
> >> * Full support for Scala/Python DSLs
> >> * Remove old file-based transform
> >> * Scala/Python wrappers for all existing algorithms
> >> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >> 2) Updated Dependencies:
> >> * Spark 2.0 support
> >> * Matrix block library (isolated jar)
> >>
> >> 3) Compiler/Runtime Features:
> >> * GPU support (full compiler and runtime support)
> >> * Compressed linear algebra v2
> >> * Code generation (automatic operator fusion)
> >> * Extended parfor (full spark exploitation, micro-batch support)
> >> * Scale-up architecture (large dense blocks, numa)?
> >>
> >> 4) Tools
> >> * Extended stats (task locality, shuffle, etc)
> >> * Cloud resource advisor (extended resource optimizer)?
> >>
> >> 5) Algorithms
> >> * Graduate "staging" algorithms (robustness/performance)
> >> * Perftest: include all algorithms into automated performance tests
> >> * Simplify usage decision trees, random forest, mlogreg, msvm
> >> (preprocessing, label representation, etc)
> >>
> >> Items marked with a ? can potentially be moved out to subsequent releases.
> >>
> >>
> >> Regards,
> >> Matthias
> >>
> >
> > My understanding is that most of the items in 1 and 2 are going to break
> > backward compatibility, while the others can be done incrementally. Is this
> > assumption correct? If so, can we finish 1 and 2 and do a 1.0 release. and
> > them, continue with 3, 4, 5, etc ? as I don't think we should wait for
> > 2017/Q2 to do a 1.0 release. I believe in release early, release often,
> > particularly to attract new users, that can help verifying and contributing
> > to specific releases.
> >
> > Thoughts ?
> >
> 
> 
> 
>

Re: [DISCUSS] Roadmap SystemML 1.0

Reply via email to