Re: [DISCUSS] Roadmap SystemML 1.0

Matthias Boehm Sun, 19 Feb 2017 17:31:28 -0800

In order to make this roadmap more concrete, I created the following epics
for the target release 1.0 with about 50 subtasks, and linked related
existing issues. Given the discussion on a short release cycle, the bare
minimum would be SYSTEMML-1299 (which includes all changes that affect the
external behavior), and a subset of SYSTEMML-1308 (especially features that
address proper cleanups and robustness against OOMs).


SYSTEMML-1299 Language feature updates
SYSTEMML-1321 Compiler feature extensions
SYSTEMML-1308 Runtime feature extensions
SYSTEMML-1284 Code generation for operator fusion
SYSTEMML-1328 Perftest extensions

I did not touch GPUs, Deep Learning, DSLs, and algorithms yet. So please
have a look, and update or create them if necessary.


Regards,
Matthias


On Mon, Jan 16, 2017 at 8:14 PM, <[email protected]> wrote:

> Yeah using the target release would be good. Actually, with that in mind,
> I believe that we have been marking closed issues since the 0.11 release as
> targeting an upcoming "1.0" release, but it would probably be more correct
> to update those to "0.12" since we decided to release 0.12. In addition, we
> should set the target of the Spark 2.x support issue to "0.13".
>
> As for the roadmap, it would be good to update the website with a
> high-level overview, with links to associated JIRA issues.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 16, 2017, at 7:35 PM, Luciano Resende <[email protected]>
> wrote:
> >
> > Instead of Epic, we could use the target release ? Also, we have a
> roadmap
> > page on the site and we should keep that up to date, or get rid of that
> and
> > use roadmap on jira.
> >
> >> On Mon, Jan 16, 2017 at 6:20 PM <[email protected]> wrote:
> >>
> >> Now that we've had some discussion here, it would be good to transfer
> this
> >> discussion into a JIRA epic, containing sub tasks. That way, we can
> >> properly track our progress on these items and facilitate contributions
> >> from the community.  Note that some of the sub tasks may already exist
> as
> >> individual issues.
> >>
> >>
> >>
> >> Would anyone in the community like to volunteer for creating these
> issues?
> >>
> >>
> >>
> >> - Mike
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >> Mike Dusenberry
> >>
> >> GitHub: github.com/dusenberrymw
> >>
> >> LinkedIn: linkedin.com/in/mikedusenberry
> >>
> >>
> >>
> >> Sent from my iPhone.
> >>
> >>
> >>
> >>
> >>
> >>>> On Jan 4, 2017, at 6:00 PM, [email protected] wrote:
> >>>
> >>>
> >>
> >>> Overall, this is a good list of items that should be worked on,
> >> particularly because it contains several user-facing items.  However, to
> >> echo what Luciano said, I'm also concerned about the timeline.  At this
> >> stage, I agree that we need to release more often, and with a more
> >> user-oriented "product" focus as a guide for timelines.  I.e. we should
> >> orient our release timelines around items that focus on the "product" of
> >> allowing the user to work on a wide range of ML problems in a simple and
> >> easy manner on top of Spark.
> >>
> >>>
> >>
> >>> With that in mind, I agree that a focus on a subset of (1) and (2)
> would
> >> be good for an immediate release, with a particular focus on Spark 2.0
> >> support as a priority.
> >>
> >>>
> >>
> >>> How about we aim for a February 1st release date for the initial items?
> >>
> >>>
> >>
> >>> -Mike
> >>
> >>>
> >>
> >>> --
> >>
> >>>
> >>
> >>> Mike Dusenberry
> >>
> >>> GitHub: github.com/dusenberrymw
> >>
> >>> LinkedIn: linkedin.com/in/mikedusenberry
> >>
> >>>
> >>
> >>> Sent from my iPhone.
> >>
> >>>
> >>
> >>>
> >>
> >>>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <[email protected]>
> wrote:
> >>
> >>>>
> >>
> >>>> Hi Matthias,
> >>
> >>>>
> >>
> >>>> Thanks for the detailed roadmap.
> >>
> >>>>
> >>
> >>>> +1 for all the items with few modifications.
> >>
> >>>>
> >>
> >>>> 1) APIs and Language:
> >>
> >>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >>
> >>>>>> Ensure Python and Scala MLContext have same API capability.
> >>
> >>>>
> >>
> >>>> * Remove old MLContext
> >>
> >>>> * Consolidate MLContext and JMLC
> >>
> >>>> * Full support for Scala/Python DSLs
> >>
> >>>>>> +1 for Python DSL except for push-down of loop structures and
> >> functions.
> >>
> >>>>
> >>
> >>>> * Remove old file-based transform
> >>
> >>>> * Scala/Python wrappers for all existing algorithms
> >>
> >>>> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >>>>
> >>
> >>>> 2) Updated Dependencies:
> >>
> >>>> * Spark 2.0 support
> >>
> >>>> * Matrix block library (isolated jar)
> >>
> >>>>
> >>
> >>>> 3) Compiler/Runtime Features:
> >>
> >>>> * GPU support (full compiler and runtime support)
> >>
> >>>>>> Can we break this down into phases:
> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
> >> timeline of the phases in the JIRA.
> >>
> >>>>
> >>
> >>>> * Compressed linear algebra v2
> >>
> >>>> * Code generation (automatic operator fusion)
> >>
> >>>> * Extended parfor (full spark exploitation, micro-batch support)
> >>
> >>>> * Scale-up architecture (large dense blocks, numa)?
> >>
> >>>>
> >>
> >>>> 4) Tools
> >>
> >>>> * Extended stats (task locality, shuffle, etc)
> >>
> >>>> * Cloud resource advisor (extended resource optimizer)?
> >>
> >>>>
> >>
> >>>> 5) Algorithms
> >>
> >>>> * Graduate "staging" algorithms (robustness/performance)
> >>
> >>>> * Perftest: include all algorithms into automated performance tests
> >>
> >>>>>> via spark-submit + via Scala/Python wrappers
> >>
> >>>>
> >>
> >>>> * Simplify usage decision trees, random forest, mlogreg, msvm
> >>
> >>>> (preprocessing, label representation, etc)
> >>
> >>>>>> + command-line variable naming. For example: maxi, maxiter, etc.
> >>
> >>>>
> >>
> >>>> Thanks,
> >>
> >>>>
> >>
> >>>> Niketan Pansare
> >>
> >>>> IBM Almaden Research Center
> >>
> >>>> E-mail: npansar At us.ibm.com
> >>
> >>>> http://researcher.watson.ibm.com/researcher/view.php?
> person=us-npansar
> >>
> >>>>
> >>
> >>>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and
> >> (4) can be done incrementally. For (5), some of the changes might also
> >>
> >>>>
> >>
> >>>> From: Matthias Boehm <[email protected]>
> >>
> >>>> To: [email protected]
> >>
> >>>> Date: 01/03/2017 02:44 PM
> >>
> >>>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>> Yes indeed, most of (3) and (4) can be done incrementally. For (5),
> some
> >>
> >>>> of the changes might also modify the signature of algorithms (i.e.,
> >>
> >>>> parameters and required input data) but it would help, for example
> with
> >>
> >>>> decision trees, as users no longer need to dummy code their inputs.
> >>
> >>>>
> >>
> >>>> Generally, I'm fine with making (3), (4), and part of (5) optional and
> >>
> >>>> let the "must-have" features from (1) and (2) determine the timeline.
> >>
> >>>>
> >>
> >>>> Regards,
> >>
> >>>> Matthias
> >>
> >>>>
> >>
> >>>> On 1/3/2017 11:27 PM, Luciano Resende wrote:
> >>
> >>>>> On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm <
> >> [email protected]>
> >>
> >>>>> wrote:
> >>
> >>>>>
> >>
> >>>>>> I'd like to initiate the discussion of a concrete roadmap for our
> >> next
> >>
> >>>>>> release. According, to previous discussions, I'd think it's fair to
> >> say
> >>
> >>>>>> that we agree on calling it SystemML 1.0. We should carefully plan
> >> this
> >>
> >>>>>> release as it's an opportunity to change APIs and remove some older
> >>
> >>>>>> deprecated features. I'd like to encourage not just developers but
> >> also the
> >>
> >>>>>> broader community to participate in this discussion.
> >>
> >>>>>>
> >>
> >>>>>> Personally, I think a target date of Q2/2017 is realistic. Let's
> >> start
> >>
> >>>>>> with collecting the major features and changes that potentially
> >> affect
> >>
> >>>>>> users. Here is an initial list, but please feel free to add and up-
> >> or
> >>
> >>>>>> down-vote the individual items.
> >>
> >>>>>>
> >>
> >>>>>> 1) APIs and Language:
> >>
> >>>>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
> >>
> >>>>>> * Remove old MLContext
> >>
> >>>>>> * Consolidate MLContext and JMLC
> >>
> >>>>>> * Full support for Scala/Python DSLs
> >>
> >>>>>> * Remove old file-based transform
> >>
> >>>>>> * Scala/Python wrappers for all existing algorithms
> >>
> >>>>>> * Data converters (additional formats: e.g., libsvm; performance)
> >>
> >>>>>>
> >>
> >>>>>> 2) Updated Dependencies:
> >>
> >>>>>> * Spark 2.0 support
> >>
> >>>>>> * Matrix block library (isolated jar)
> >>
> >>>>>>
> >>
> >>>>>> 3) Compiler/Runtime Features:
> >>
> >>>>>> * GPU support (full compiler and runtime support)
> >>
> >>>>>> * Compressed linear algebra v2
> >>
> >>>>>> * Code generation (automatic operator fusion)
> >>
> >>>>>> * Extended parfor (full spark exploitation, micro-batch support)
> >>
> >>>>>> * Scale-up architecture (large dense blocks, numa)?
> >>
> >>>>>>
> >>
> >>>>>> 4) Tools
> >>
> >>>>>> * Extended stats (task locality, shuffle, etc)
> >>
> >>>>>> * Cloud resource advisor (extended resource optimizer)?
> >>
> >>>>>>
> >>
> >>>>>> 5) Algorithms
> >>
> >>>>>> * Graduate "staging" algorithms (robustness/performance)
> >>
> >>>>>> * Perftest: include all algorithms into automated performance tests
> >>
> >>>>>> * Simplify usage decision trees, random forest, mlogreg, msvm
> >>
> >>>>>> (preprocessing, label representation, etc)
> >>
> >>>>>>
> >>
> >>>>>> Items marked with a ? can potentially be moved out to subsequent
> >> releases.
> >>
> >>>>>>
> >>
> >>>>>>
> >>
> >>>>>> Regards,
> >>
> >>>>>> Matthias
> >>
> >>>>>>
> >>
> >>>>>
> >>
> >>>>> My understanding is that most of the items in 1 and 2 are going to
> >> break
> >>
> >>>>> backward compatibility, while the others can be done incrementally.
> >> Is this
> >>
> >>>>> assumption correct? If so, can we finish 1 and 2 and do a 1.0
> >> release. and
> >>
> >>>>> them, continue with 3, 4, 5, etc ? as I don't think we should wait
> for
> >>
> >>>>> 2017/Q2 to do a 1.0 release. I believe in release early, release
> >> often,
> >>
> >>>>> particularly to attract new users, that can help verifying and
> >> contributing
> >>
> >>>>> to specific releases.
> >>
> >>>>>
> >>
> >>>>> Thoughts ?
> >>
> >>>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >>>>
> >>
> >> --
> > Sent from my Mobile device
>

Re: [DISCUSS] Roadmap SystemML 1.0

Reply via email to