Re: Simplification of MLContext and related APIs

2016-09-12 Thread Mike Dusenberry
I also agree that internal data structures shouldn't be exposed to a user.
However, I think we definitely need to keep the `Matrix` and `Frame` types
in the API, in agreement with Arvind.  The main purpose of SystemML for a
user is to allow for machine learning algorithms involving matrices to be
run on a given system (laptop, Spark cluster, etc.).  Anything involving a
compilation chain directly is noise for our ML users.  Thus it's quite
useful for SystemML to expose a `Matrix` type with a limited API as is
currently done in MLContext.  This allows a user to interact with SystemML
via these `Matrix` objects which abstractly represent the core data
structure of a SystemML script.  Furthermore, these Matrix objects can be
used as subsequent input to an additional script, or can be converted to a
DataFrame once the user is ready to continue interacting with Spark.  As
Arvind mentioned, this just allows the DML `Matrix` type to be effectively
exposed at the API level as well.  Additionally, we plan to unify this
`Matrix` type with the lazy matrix types we are creating in the Python and
Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
DML.  The similar argument exists for `Frame` as well.

I think that limiting the exposure of internal structures to users could be
useful, but removing `Matrix` & `Frame` and instead having a user deal
directly with compilation chains would be a step backwards.

- Mike

--

Michael W. Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

On Sun, Sep 11, 2016 at 5:52 PM, Acs S  wrote:

> Yes, I agree that we should NOT expose any internal objects at API
> level.Objects like FrameObject, MatrixObject should not be exposed as those
> are internal objects.
> Rule of thumb should be if object (Frame, Object or Scalar) is exposed at
> DML level it should be exposed at MlContext level.If there is need to
> add anything extra object besides being exposed in DML it should be
> justifiable with rationale.
> I have introduced FrameObject as oversight. It should have been private
> method instead of public method. I can fix it soon. But there are more
> changes you have proposed I will let Deron to respond.
> Thanks for catching these issues.
> -Arvind
>
>   From: Matthias Boehm 
>  To: dev 
>  Sent: Sunday, September 11, 2016 9:43 AM
>  Subject: Simplification of MLContext and related APIs
>
>
>
> It's great to see the ongoing progress on MLContext and related APIs.
> However, one aspect that really concerns me is the creation of many
> redundant data types and exposition of various internal data structures.
> For example, exposing MatrixObject and FrameObject at API level is
> dangerous because it makes external programs data-dependent on internal
> structures that might be subject to change (no API stability) and users
> might not be aware of the implications their interactions have on the
> buffer pool etc. Furthermore, having such a plethora of entry points makes
> it very hard to ensure consistency of the compilation chain with regard to
> configuration handling, environment setup and advanced compilation
> techniques.
>
> I would recommend to create a holistic design across the various APIs that
> aims to (1) reduce the number of exposed data types (for instance, I would
> like to remove MatrixObject/FrameObject from the external interface, as
> well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> related meta data objects), and (2) create a configurable compilation chain
> that is invoked from all external APIs. I understand that these data types
> were introduced to simplify, for example, imports in user programs but I'm
> sure we find an alternative realization with less redundancy. What do you
> think?
>
> Regards,
> Matthias
>
>
>


Re: Simplification of MLContext and related APIs

2016-09-12 Thread Deron Eriksson
Feel free to not expose MatrixObject and FrameObject. I am fine with that.
The only reason MatrixObject and FrameObject are exposed is that I felt if
the new MLContext API did not expose them, there would be complaints from
existing committers that these objects were not available. I can't see
anyone outside of SystemML core developers caring about MatrixObject and
FrameObject or even for that matter ever even using these classes. Users
want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically
anything but a MatrixObject or FrameObject.

If you remove entities such as Matrix and Frame, you have the older
MLContext API. Perhaps users who don't wish to use objects such as Matrix
and Frame can use the older API since these suggestions are already built
into the old API?

Deron


On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry 
wrote:

> I also agree that internal data structures shouldn't be exposed to a user.
> However, I think we definitely need to keep the `Matrix` and `Frame` types
> in the API, in agreement with Arvind.  The main purpose of SystemML for a
> user is to allow for machine learning algorithms involving matrices to be
> run on a given system (laptop, Spark cluster, etc.).  Anything involving a
> compilation chain directly is noise for our ML users.  Thus it's quite
> useful for SystemML to expose a `Matrix` type with a limited API as is
> currently done in MLContext.  This allows a user to interact with SystemML
> via these `Matrix` objects which abstractly represent the core data
> structure of a SystemML script.  Furthermore, these Matrix objects can be
> used as subsequent input to an additional script, or can be converted to a
> DataFrame once the user is ready to continue interacting with Spark.  As
> Arvind mentioned, this just allows the DML `Matrix` type to be effectively
> exposed at the API level as well.  Additionally, we plan to unify this
> `Matrix` type with the lazy matrix types we are creating in the Python and
> Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
> DML.  The similar argument exists for `Frame` as well.
>
> I think that limiting the exposure of internal structures to users could be
> useful, but removing `Matrix` & `Frame` and instead having a user deal
> directly with compilation chains would be a step backwards.
>
> - Mike
>
> --
>
> Michael W. Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> On Sun, Sep 11, 2016 at 5:52 PM, Acs S  wrote:
>
> > Yes, I agree that we should NOT expose any internal objects at API
> > level.Objects like FrameObject, MatrixObject should not be exposed as
> those
> > are internal objects.
> > Rule of thumb should be if object (Frame, Object or Scalar) is exposed at
> > DML level it should be exposed at MlContext level.If there is need to
> > add anything extra object besides being exposed in DML it should be
> > justifiable with rationale.
> > I have introduced FrameObject as oversight. It should have been private
> > method instead of public method. I can fix it soon. But there are more
> > changes you have proposed I will let Deron to respond.
> > Thanks for catching these issues.
> > -Arvind
> >
> >   From: Matthias Boehm 
> >  To: dev 
> >  Sent: Sunday, September 11, 2016 9:43 AM
> >  Subject: Simplification of MLContext and related APIs
> >
> >
> >
> > It's great to see the ongoing progress on MLContext and related APIs.
> > However, one aspect that really concerns me is the creation of many
> > redundant data types and exposition of various internal data structures.
> > For example, exposing MatrixObject and FrameObject at API level is
> > dangerous because it makes external programs data-dependent on internal
> > structures that might be subject to change (no API stability) and users
> > might not be aware of the implications their interactions have on the
> > buffer pool etc. Furthermore, having such a plethora of entry points
> makes
> > it very hard to ensure consistency of the compilation chain with regard
> to
> > configuration handling, environment setup and advanced compilation
> > techniques.
> >
> > I would recommend to create a holistic design across the various APIs
> that
> > aims to (1) reduce the number of exposed data types (for instance, I
> would
> > like to remove MatrixObject/FrameObject from the external interface, as
> > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> > related meta data objects), and (2) create a configurable compilation
> chain
> > that is invoked from all external APIs. I understand that these data
> types
> > were introduced to simplify, for example, imports in user programs but
> I'm
> > sure we find an alternative realization with less redundancy. What do you
> > think?
> >
> > Regards,
> > Matthias
> >
> >
> >
>


Re: Simplification of MLContext and related APIs

2016-09-12 Thread Matthias Boehm

great - then we're all on the same page. Let me just clarify two aspects:
First, I think we do need abstract frame/matrix data types at API level,
but just one type that is used consistently across MLContext and all DSLs
we're about to add. Second, relying on a common compilation chain does not
directly affect users but ensures consistent behavior across all APIs.

So the bottom line is, we're going to remove MatrixObject/FrameObject and
other internal structures from API level, remove the
BinaryBlockMatrix/BinaryBlockFrame types, and try to consolidate the
various Matrix/Frame objects as well as replicated compilation chains.

Regards,
Matthias



From:   Deron Eriksson 
To: dev@systemml.incubator.apache.org
Date:   09/12/2016 01:56 PM
Subject:Re: Simplification of MLContext and related APIs



Feel free to not expose MatrixObject and FrameObject. I am fine with that.
The only reason MatrixObject and FrameObject are exposed is that I felt if
the new MLContext API did not expose them, there would be complaints from
existing committers that these objects were not available. I can't see
anyone outside of SystemML core developers caring about MatrixObject and
FrameObject or even for that matter ever even using these classes. Users
want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically
anything but a MatrixObject or FrameObject.

If you remove entities such as Matrix and Frame, you have the older
MLContext API. Perhaps users who don't wish to use objects such as Matrix
and Frame can use the older API since these suggestions are already built
into the old API?

Deron


On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry 
wrote:

> I also agree that internal data structures shouldn't be exposed to a
user.
> However, I think we definitely need to keep the `Matrix` and `Frame`
types
> in the API, in agreement with Arvind.  The main purpose of SystemML for a
> user is to allow for machine learning algorithms involving matrices to be
> run on a given system (laptop, Spark cluster, etc.).  Anything involving
a
> compilation chain directly is noise for our ML users.  Thus it's quite
> useful for SystemML to expose a `Matrix` type with a limited API as is
> currently done in MLContext.  This allows a user to interact with
SystemML
> via these `Matrix` objects which abstractly represent the core data
> structure of a SystemML script.  Furthermore, these Matrix objects can be
> used as subsequent input to an additional script, or can be converted to
a
> DataFrame once the user is ready to continue interacting with Spark.  As
> Arvind mentioned, this just allows the DML `Matrix` type to be
effectively
> exposed at the API level as well.  Additionally, we plan to unify this
> `Matrix` type with the lazy matrix types we are creating in the Python
and
> Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
> DML.  The similar argument exists for `Frame` as well.
>
> I think that limiting the exposure of internal structures to users could
be
> useful, but removing `Matrix` & `Frame` and instead having a user deal
> directly with compilation chains would be a step backwards.
>
> - Mike
>
> --
>
> Michael W. Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> On Sun, Sep 11, 2016 at 5:52 PM, Acs S  wrote:
>
> > Yes, I agree that we should NOT expose any internal objects at API
> > level.Objects like FrameObject, MatrixObject should not be exposed as
> those
> > are internal objects.
> > Rule of thumb should be if object (Frame, Object or Scalar) is exposed
at
> > DML level it should be exposed at MlContext level.If there is need to
> > add anything extra object besides being exposed in DML it should be
> > justifiable with rationale.
> > I have introduced FrameObject as oversight. It should have been private
> > method instead of public method. I can fix it soon. But there are more
> > changes you have proposed I will let Deron to respond.
> > Thanks for catching these issues.
> > -Arvind
> >
> >   From: Matthias Boehm 
> >  To: dev 
> >  Sent: Sunday, September 11, 2016 9:43 AM
> >  Subject: Simplification of MLContext and related APIs
> >
> >
> >
> > It's great to see the ongoing progress on MLContext and related APIs.
> > However, one aspect that really concerns me is the creation of many
> > redundant data types and exposition of various internal data
structures.
> > For example, exposing MatrixObject and FrameObject at API level is
> > dangerous because it makes external programs data-dependent on internal
> > structures that might be subject to change (no API stability) and users
> > might not be aware of the implications their interactions have on the
> > buffer pool etc. Furthermore, having such a plethora of entry points
> makes
> > it very hard to ensure consistency of the compilation chain with regard
> to
> > configuration handling, environment setup and advanced compilation
> > techniques.
> >
> > I would recommend to create a hol

Re: Simplification of MLContext and related APIs

2016-09-12 Thread Deron Eriksson
Hi Matthias,

Great! I would be very happy to see BinaryBlockMatrix incorporated into
Matrix and BinaryBlockFrame incorporated into Frame since this would be a
welcome simplification of the API. Reducing the API to the essential
concepts is a big win for our users. This would have already happened if I
had the depth of knowledge of SystemML required to make this happen in a
reasonable timeframe.

I would definitely approve of further extracting Matrix and Frame to a
common type if this can be done in a way that feels natural for the end
user. At this point I can't really explain it further, but if I expect to
get back a matrix of numbers, I want this to feel natural, and if I get
back a frame consisting of columns of different data types, I want this to
feel natural too. I want our end users to put in data and get out results
in a minumum number of steps that feel intuitive. By the way, I think we
are getting very close, which is a great sign!

Deron


On Mon, Sep 12, 2016 at 2:21 PM, Matthias Boehm  wrote:

> great - then we're all on the same page. Let me just clarify two aspects:
> First, I think we do need abstract frame/matrix data types at API level,
> but just one type that is used consistently across MLContext and all DSLs
> we're about to add. Second, relying on a common compilation chain does not
> directly affect users but ensures consistent behavior across all APIs.
>
> So the bottom line is, we're going to remove MatrixObject/FrameObject and
> other internal structures from API level, remove the 
> BinaryBlockMatrix/BinaryBlockFrame
> types, and try to consolidate the various Matrix/Frame objects as well as
> replicated compilation chains.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---09/12/2016 01:56:55
> PM---Feel free to not expose MatrixObject and FrameObject. I am]Deron
> Eriksson ---09/12/2016 01:56:55 PM---Feel free to not expose MatrixObject
> and FrameObject. I am fine with that. The only reason MatrixObj
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 09/12/2016 01:56 PM
> Subject: Re: Simplification of MLContext and related APIs
> --
>
>
>
> Feel free to not expose MatrixObject and FrameObject. I am fine with that.
> The only reason MatrixObject and FrameObject are exposed is that I felt if
> the new MLContext API did not expose them, there would be complaints from
> existing committers that these objects were not available. I can't see
> anyone outside of SystemML core developers caring about MatrixObject and
> FrameObject or even for that matter ever even using these classes. Users
> want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically
> anything but a MatrixObject or FrameObject.
>
> If you remove entities such as Matrix and Frame, you have the older
> MLContext API. Perhaps users who don't wish to use objects such as Matrix
> and Frame can use the older API since these suggestions are already built
> into the old API?
>
> Deron
>
>
> On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry 
> wrote:
>
> > I also agree that internal data structures shouldn't be exposed to a
> user.
> > However, I think we definitely need to keep the `Matrix` and `Frame`
> types
> > in the API, in agreement with Arvind.  The main purpose of SystemML for a
> > user is to allow for machine learning algorithms involving matrices to be
> > run on a given system (laptop, Spark cluster, etc.).  Anything involving
> a
> > compilation chain directly is noise for our ML users.  Thus it's quite
> > useful for SystemML to expose a `Matrix` type with a limited API as is
> > currently done in MLContext.  This allows a user to interact with
> SystemML
> > via these `Matrix` objects which abstractly represent the core data
> > structure of a SystemML script.  Furthermore, these Matrix objects can be
> > used as subsequent input to an additional script, or can be converted to
> a
> > DataFrame once the user is ready to continue interacting with Spark.  As
> > Arvind mentioned, this just allows the DML `Matrix` type to be
> effectively
> > exposed at the API level as well.  Additionally, we plan to unify this
> > `Matrix` type with the lazy matrix types we are creating in the Python
> and
> > Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
> > DML.  The similar argument exists for `Frame` as well.
> >
> > I think that limiting the exposure of internal structures to users could
> be
> > useful, but removing `Matrix` & `Frame` and instead having a user deal
> > directly with compilation chains would be a step backwards.
> >
> > - Mike
> >
> > --
> >
> > Michael W. Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > On Sun, Sep 11, 2016 at 5:52 PM, Acs S  wrote:
> >
> > > Yes, I agree that we should NOT expose any internal objects at API
> > > level.Objects like FrameObject, MatrixObject should not be exposed as
> > those
> > > are internal objects.
>