great - then we're all on the same page. Let me just clarify two aspects:
First, I think we do need abstract frame/matrix data types at API level,
but just one type that is used consistently across MLContext and all DSLs
we're about to add. Second, relying on a common compilation chain does not
directly affect users but ensures consistent behavior across all APIs.

So the bottom line is, we're going to remove MatrixObject/FrameObject and
other internal structures from API level, remove the
BinaryBlockMatrix/BinaryBlockFrame types, and try to consolidate the
various Matrix/Frame objects as well as replicated compilation chains.

Regards,
Matthias



From:   Deron Eriksson <deroneriks...@gmail.com>
To:     dev@systemml.incubator.apache.org
Date:   09/12/2016 01:56 PM
Subject:        Re: Simplification of MLContext and related APIs



Feel free to not expose MatrixObject and FrameObject. I am fine with that.
The only reason MatrixObject and FrameObject are exposed is that I felt if
the new MLContext API did not expose them, there would be complaints from
existing committers that these objects were not available. I can't see
anyone outside of SystemML core developers caring about MatrixObject and
FrameObject or even for that matter ever even using these classes. Users
want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically
anything but a MatrixObject or FrameObject.

If you remove entities such as Matrix and Frame, you have the older
MLContext API. Perhaps users who don't wish to use objects such as Matrix
and Frame can use the older API since these suggestions are already built
into the old API?

Deron


On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry <dusenberr...@gmail.com>
wrote:

> I also agree that internal data structures shouldn't be exposed to a
user.
> However, I think we definitely need to keep the `Matrix` and `Frame`
types
> in the API, in agreement with Arvind.  The main purpose of SystemML for a
> user is to allow for machine learning algorithms involving matrices to be
> run on a given system (laptop, Spark cluster, etc.).  Anything involving
a
> compilation chain directly is noise for our ML users.  Thus it's quite
> useful for SystemML to expose a `Matrix` type with a limited API as is
> currently done in MLContext.  This allows a user to interact with
SystemML
> via these `Matrix` objects which abstractly represent the core data
> structure of a SystemML script.  Furthermore, these Matrix objects can be
> used as subsequent input to an additional script, or can be converted to
a
> DataFrame once the user is ready to continue interacting with Spark.  As
> Arvind mentioned, this just allows the DML `Matrix` type to be
effectively
> exposed at the API level as well.  Additionally, we plan to unify this
> `Matrix` type with the lazy matrix types we are creating in the Python
and
> Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in
> DML.  The similar argument exists for `Frame` as well.
>
> I think that limiting the exposure of internal structures to users could
be
> useful, but removing `Matrix` & `Frame` and instead having a user deal
> directly with compilation chains would be a step backwards.
>
> - Mike
>
> --
>
> Michael W. Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> On Sun, Sep 11, 2016 at 5:52 PM, Acs S <ac...@yahoo.com.invalid> wrote:
>
> > Yes, I agree that we should NOT expose any internal objects at API
> > level.Objects like FrameObject, MatrixObject should not be exposed as
> those
> > are internal objects.
> > Rule of thumb should be if object (Frame, Object or Scalar) is exposed
at
> > DML level it should be exposed at MlContext level.If there is need to
> > add anything extra object besides being exposed in DML it should be
> > justifiable with rationale.
> > I have introduced FrameObject as oversight. It should have been private
> > method instead of public method. I can fix it soon. But there are more
> > changes you have proposed I will let Deron to respond.
> > Thanks for catching these issues.
> > -Arvind
> >
> >       From: Matthias Boehm <mbo...@us.ibm.com>
> >  To: dev <dev@systemml.incubator.apache.org>
> >  Sent: Sunday, September 11, 2016 9:43 AM
> >  Subject: Simplification of MLContext and related APIs
> >
> >
> >
> > It's great to see the ongoing progress on MLContext and related APIs.
> > However, one aspect that really concerns me is the creation of many
> > redundant data types and exposition of various internal data
structures.
> > For example, exposing MatrixObject and FrameObject at API level is
> > dangerous because it makes external programs data-dependent on internal
> > structures that might be subject to change (no API stability) and users
> > might not be aware of the implications their interactions have on the
> > buffer pool etc. Furthermore, having such a plethora of entry points
> makes
> > it very hard to ensure consistency of the compilation chain with regard
> to
> > configuration handling, environment setup and advanced compilation
> > techniques.
> >
> > I would recommend to create a holistic design across the various APIs
> that
> > aims to (1) reduce the number of exposed data types (for instance, I
> would
> > like to remove MatrixObject/FrameObject from the external interface, as
> > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and
> > related meta data objects), and (2) create a configurable compilation
> chain
> > that is invoked from all external APIs. I understand that these data
> types
> > were introduced to simplify, for example, imports in user programs but
> I'm
> > sure we find an alternative realization with less redundancy. What do
you
> > think?
> >
> > Regards,
> > Matthias
> >
> >
> >
>

Reply via email to