Re: Time To Release 0.13

2017-01-16 Thread Arvind Surve
We are planning to have 80GB testing for 0.13 release (to support Spark 2.0). 
It will add couple of days for overall performance testing but its worth try 
before releasing SystemML on Spark 2.0.If we find issues which are not show 
stopping then we can release Spark 2.0, but we should get 80GB testing done.

 --Arvind SurveSpark Technology Centerhttp://www.spark.tc/

  From: "dusenberr...@gmail.com" 
 To: dev@systemml.incubator.apache.org 
 Sent: Monday, January 16, 2017 6:38 PM
 Subject: Time To Release 0.13
   
Hi all,

Now that Spark 2.x support has been merged [1], I think we should go ahead and 
start a release process for 0.13.  That way, we will have an official release 
that supports 2.x, in addition to 0.12 that supports 1.6.

I'd like to propose that as long as our tests pass, and a performance suite on 
*8GB* is reasonably acceptable, we go ahead and release. We can save more 
detailed performance testing and any possible improvements for 0.13.x releases, 
or more importantly for the upcoming 1.0 release.  I think it would be a good 
idea to have one official release on Spark 2.x that the community can stress 
test before our 1.0 release.

Thoughts?

[1]: 
https://github.com/apache/incubator-systemml/commit/a4c7be78390d01a3194e726d7a184c182bd8b558

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


   

Re: [DISCUSS] Roadmap SystemML 1.0

2017-01-16 Thread dusenberrymw
Yeah using the target release would be good. Actually, with that in mind, I 
believe that we have been marking closed issues since the 0.11 release as 
targeting an upcoming "1.0" release, but it would probably be more correct to 
update those to "0.12" since we decided to release 0.12. In addition, we should 
set the target of the Spark 2.x support issue to "0.13".

As for the roadmap, it would be good to update the website with a high-level 
overview, with links to associated JIRA issues.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 16, 2017, at 7:35 PM, Luciano Resende  wrote:
> 
> Instead of Epic, we could use the target release ? Also, we have a roadmap
> page on the site and we should keep that up to date, or get rid of that and
> use roadmap on jira.
> 
>> On Mon, Jan 16, 2017 at 6:20 PM  wrote:
>> 
>> Now that we've had some discussion here, it would be good to transfer this
>> discussion into a JIRA epic, containing sub tasks. That way, we can
>> properly track our progress on these items and facilitate contributions
>> from the community.  Note that some of the sub tasks may already exist as
>> individual issues.
>> 
>> 
>> 
>> Would anyone in the community like to volunteer for creating these issues?
>> 
>> 
>> 
>> - Mike
>> 
>> 
>> 
>> --
>> 
>> 
>> 
>> Mike Dusenberry
>> 
>> GitHub: github.com/dusenberrymw
>> 
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> 
>> 
>> Sent from my iPhone.
>> 
>> 
>> 
>> 
>> 
 On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote:
>>> 
>>> 
>> 
>>> Overall, this is a good list of items that should be worked on,
>> particularly because it contains several user-facing items.  However, to
>> echo what Luciano said, I'm also concerned about the timeline.  At this
>> stage, I agree that we need to release more often, and with a more
>> user-oriented "product" focus as a guide for timelines.  I.e. we should
>> orient our release timelines around items that focus on the "product" of
>> allowing the user to work on a wide range of ML problems in a simple and
>> easy manner on top of Spark.
>> 
>>> 
>> 
>>> With that in mind, I agree that a focus on a subset of (1) and (2) would
>> be good for an immediate release, with a particular focus on Spark 2.0
>> support as a priority.
>> 
>>> 
>> 
>>> How about we aim for a February 1st release date for the initial items?
>> 
>>> 
>> 
>>> -Mike
>> 
>>> 
>> 
>>> --
>> 
>>> 
>> 
>>> Mike Dusenberry
>> 
>>> GitHub: github.com/dusenberrymw
>> 
>>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>>> 
>> 
>>> Sent from my iPhone.
>> 
>>> 
>> 
>>> 
>> 
 On Jan 3, 2017, at 4:17 PM, Niketan Pansare  wrote:
>> 
 
>> 
 Hi Matthias,
>> 
 
>> 
 Thanks for the detailed roadmap.
>> 
 
>> 
 +1 for all the items with few modifications.
>> 
 
>> 
 1) APIs and Language:
>> 
 * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> 
>> Ensure Python and Scala MLContext have same API capability.
>> 
 
>> 
 * Remove old MLContext
>> 
 * Consolidate MLContext and JMLC
>> 
 * Full support for Scala/Python DSLs
>> 
>> +1 for Python DSL except for push-down of loop structures and
>> functions.
>> 
 
>> 
 * Remove old file-based transform
>> 
 * Scala/Python wrappers for all existing algorithms
>> 
 * Data converters (additional formats: e.g., libsvm; performance)
>> 
 
>> 
 2) Updated Dependencies:
>> 
 * Spark 2.0 support
>> 
 * Matrix block library (isolated jar)
>> 
 
>> 
 3) Compiler/Runtime Features:
>> 
 * GPU support (full compiler and runtime support)
>> 
>> Can we break this down into phases:
>> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the
>> timeline of the phases in the JIRA.
>> 
 
>> 
 * Compressed linear algebra v2
>> 
 * Code generation (automatic operator fusion)
>> 
 * Extended parfor (full spark exploitation, micro-batch support)
>> 
 * Scale-up architecture (large dense blocks, numa)?
>> 
 
>> 
 4) Tools
>> 
 * Extended stats (task locality, shuffle, etc)
>> 
 * Cloud resource advisor (extended resource optimizer)?
>> 
 
>> 
 5) Algorithms
>> 
 * Graduate "staging" algorithms (robustness/performance)
>> 
 * Perftest: include all algorithms into automated performance tests
>> 
>> via spark-submit + via Scala/Python wrappers
>> 
 
>> 
 * Simplify usage decision trees, random forest, mlogreg, msvm
>> 
 (preprocessing, label representation, etc)
>> 
>> + command-line variable naming. For example: maxi, maxiter, etc.
>> 
 
>> 
 Thanks,
>> 
 
>> 
 Niketan Pansare
>> 
 IBM Almaden Research Center
>> 
 E-mail: npansar At us.ibm.com
>> 
 http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
 
>> 
 Matthias Boehm 

SystemML optimizer design

2017-01-16 Thread Dylan Hutchison
Hi there,

I learned about SystemML and its optimizer from the recent SPOOF paper
.  The gist I
absorbed is that SystemML translates linear algebra expressions given by
its DML to relational algebra, then applies standard relational algebra
optimizations, and then re-recognizes the result in linear algebra kernels,
with an attempt to fuse them.

I think I found the SystemML rewrite rules here
.
A couple questions:

   1. It appears that SystemML rewrites HOP expressions destructively,
   i.e., by throwing away the old expression.  In this case, how does SystemML
   determine the order of rewrites to apply?  Where does cost-based
   optimization come into play?

   2. Is there a way to "debug/visualize" the optimization process?  That
   is, when I start with a DML program, can I view (a) the DML program parsed
   into HOPs; (b) what rules fire and where in the plan, as well as the plan
   after each rule fires; and (c) the lowering and fusing of operators to LOPs?

   I know this is a lot to ask for; I'm curious how far SystemML has gone
   in this direction.

   3. Is there any relationship between the SystemML optimizer and Apache
   Calcite ?  If not, I'd love to understand
   the design decisions that differentiate the two.

Thanks, Dylan Hutchison


Re: Broken Website Menu On iOS

2017-01-16 Thread Jeremy Anderson
Will do.

...

Jeremy Anderson

Github: https://github.com/objectadjective
Twitter: https://twitter.com/ObjectAdjective
LinkedIn: http://www.linkedin.com/in/objectadjective

On 16 January 2017 at 18:53,  wrote:

> Awesome!  Thanks, Jeremy (& Dexter)!  I just discovered it, so there's not
> an issue created yet -- can you create one?
>
> Thanks!
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jan 16, 2017, at 6:40 PM, Jeremy Anderson 
> wrote:
> >
> > Dexter and I will pick this up. Is there an issue for this already?
> >
> > ...
> >
> > Jeremy Anderson
> >
> > Github: https://github.com/objectadjective
> > Twitter: https://twitter.com/ObjectAdjective
> > LinkedIn: http://www.linkedin.com/in/objectadjective
> >
> >> On 16 January 2017 at 18:27,  wrote:
> >>
> >> Hi all,
> >>
> >> It appears that the main website drop-down menus (Community, Apache) are
> >> broken on iOS browsers (iPhone).  By "broken", I mean that it is not
> >> possible to click on the down-arrow to expand those drop-down menus.
> >>
> >> 1. Can someone check if this is also the case on Android browsers?  In
> >> Chrome with mobile rendering?
> >> 2. Would someone like to volunteer to fix this?
> >>
> >> -Mike
> >>
> >> --
> >>
> >> Mike Dusenberry
> >> GitHub: github.com/dusenberrymw
> >> LinkedIn: linkedin.com/in/mikedusenberry
> >>
> >> Sent from my iPhone.
> >>
> >>
>


Re: Broken Website Menu On iOS

2017-01-16 Thread dusenberrymw
Awesome!  Thanks, Jeremy (& Dexter)!  I just discovered it, so there's not an 
issue created yet -- can you create one?

Thanks!

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 16, 2017, at 6:40 PM, Jeremy Anderson  
> wrote:
> 
> Dexter and I will pick this up. Is there an issue for this already?
> 
> ...
> 
> Jeremy Anderson
> 
> Github: https://github.com/objectadjective
> Twitter: https://twitter.com/ObjectAdjective
> LinkedIn: http://www.linkedin.com/in/objectadjective
> 
>> On 16 January 2017 at 18:27,  wrote:
>> 
>> Hi all,
>> 
>> It appears that the main website drop-down menus (Community, Apache) are
>> broken on iOS browsers (iPhone).  By "broken", I mean that it is not
>> possible to click on the down-arrow to expand those drop-down menus.
>> 
>> 1. Can someone check if this is also the case on Android browsers?  In
>> Chrome with mobile rendering?
>> 2. Would someone like to volunteer to fix this?
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 


Broken Website Menu On iOS

2017-01-16 Thread dusenberrymw
Hi all,

It appears that the main website drop-down menus (Community, Apache) are broken 
on iOS browsers (iPhone).  By "broken", I mean that it is not possible to click 
on the down-arrow to expand those drop-down menus.

1. Can someone check if this is also the case on Android browsers?  In Chrome 
with mobile rendering?
2. Would someone like to volunteer to fix this?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: [DISCUSS] Roadmap SystemML 1.0

2017-01-16 Thread dusenberrymw
Now that we've had some discussion here, it would be good to transfer this 
discussion into a JIRA epic, containing sub tasks. That way, we can properly 
track our progress on these items and facilitate contributions from the 
community.  Note that some of the sub tasks may already exist as individual 
issues.

Would anyone in the community like to volunteer for creating these issues?

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote:
> 
> Overall, this is a good list of items that should be worked on, particularly 
> because it contains several user-facing items.  However, to echo what Luciano 
> said, I'm also concerned about the timeline.  At this stage, I agree that we 
> need to release more often, and with a more user-oriented "product" focus as 
> a guide for timelines.  I.e. we should orient our release timelines around 
> items that focus on the "product" of allowing the user to work on a wide 
> range of ML problems in a simple and easy manner on top of Spark.
> 
> With that in mind, I agree that a focus on a subset of (1) and (2) would be 
> good for an immediate release, with a particular focus on Spark 2.0 support 
> as a priority.
> 
> How about we aim for a February 1st release date for the initial items?
> 
> -Mike
> 
> --
> 
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
> 
> Sent from my iPhone.
> 
> 
>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare  wrote:
>> 
>> Hi Matthias,
>> 
>> Thanks for the detailed roadmap. 
>> 
>> +1 for all the items with few modifications.
>> 
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> >> Ensure Python and Scala MLContext have same API capability.
>> 
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> >> +1 for Python DSL except for push-down of loop structures and functions. 
>> 
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>> 
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> >> Can we break this down into phases: 
>> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the 
>> >> timeline of the phases in the JIRA.
>> 
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>> 
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>> 
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> >> via spark-submit + via Scala/Python wrappers
>> 
>> * Simplify usage decision trees, random forest, mlogreg, msvm 
>> (preprocessing, label representation, etc)
>> >> + command-line variable naming. For example: maxi, maxiter, etc.
>> 
>> Thanks,
>> 
>> Niketan Pansare
>> IBM Almaden Research Center
>> E-mail: npansar At us.ibm.com
>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) 
>> can be done incrementally. For (5), some of the changes might also
>> 
>> From: Matthias Boehm 
>> To: dev@systemml.incubator.apache.org
>> Date: 01/03/2017 02:44 PM
>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
>> 
>> 
>> 
>> 
>> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
>> of the changes might also modify the signature of algorithms (i.e., 
>> parameters and required input data) but it would help, for example with 
>> decision trees, as users no longer need to dummy code their inputs.
>> 
>> Generally, I'm fine with making (3), (4), and part of (5) optional and 
>> let the "must-have" features from (1) and (2) determine the timeline.
>> 
>> Regards,
>> Matthias
>> 
>> On 1/3/2017 11:27 PM, Luciano Resende wrote:
>> > On Tue, Jan 3, 2017 at 11:50 AM, Matthias Boehm 
>> > wrote:
>> >
>> >> I'd like to initiate the discussion of a concrete roadmap for our next
>> >> release. According, to previous discussions, I'd think it's fair to say
>> >> that we agree on calling it SystemML 1.0. We should carefully plan this
>> >> release as it's an opportunity to change APIs and remove some older
>> >> deprecated features. I'd like to encourage not just developers but also 
>> >> the
>> >> broader community to participate in this discussion.
>> >>
>> >> Personally, I think a target date of Q2/2017 is realistic. Let's start
>> >> with