Hi Matthias, Great! I would be very happy to see BinaryBlockMatrix incorporated into Matrix and BinaryBlockFrame incorporated into Frame since this would be a welcome simplification of the API. Reducing the API to the essential concepts is a big win for our users. This would have already happened if I had the depth of knowledge of SystemML required to make this happen in a reasonable timeframe.
I would definitely approve of further extracting Matrix and Frame to a common type if this can be done in a way that feels natural for the end user. At this point I can't really explain it further, but if I expect to get back a matrix of numbers, I want this to feel natural, and if I get back a frame consisting of columns of different data types, I want this to feel natural too. I want our end users to put in data and get out results in a minumum number of steps that feel intuitive. By the way, I think we are getting very close, which is a great sign! Deron On Mon, Sep 12, 2016 at 2:21 PM, Matthias Boehm <[email protected]> wrote: > great - then we're all on the same page. Let me just clarify two aspects: > First, I think we do need abstract frame/matrix data types at API level, > but just one type that is used consistently across MLContext and all DSLs > we're about to add. Second, relying on a common compilation chain does not > directly affect users but ensures consistent behavior across all APIs. > > So the bottom line is, we're going to remove MatrixObject/FrameObject and > other internal structures from API level, remove the > BinaryBlockMatrix/BinaryBlockFrame > types, and try to consolidate the various Matrix/Frame objects as well as > replicated compilation chains. > > Regards, > Matthias > > [image: Inactive hide details for Deron Eriksson ---09/12/2016 01:56:55 > PM---Feel free to not expose MatrixObject and FrameObject. I am]Deron > Eriksson ---09/12/2016 01:56:55 PM---Feel free to not expose MatrixObject > and FrameObject. I am fine with that. The only reason MatrixObj > > From: Deron Eriksson <[email protected]> > To: [email protected] > Date: 09/12/2016 01:56 PM > Subject: Re: Simplification of MLContext and related APIs > ------------------------------ > > > > Feel free to not expose MatrixObject and FrameObject. I am fine with that. > The only reason MatrixObject and FrameObject are exposed is that I felt if > the new MLContext API did not expose them, there would be complaints from > existing committers that these objects were not available. I can't see > anyone outside of SystemML core developers caring about MatrixObject and > FrameObject or even for that matter ever even using these classes. Users > want DataFrames, DataSets, RDDs, 2D arrays, CSV files, or practically > anything but a MatrixObject or FrameObject. > > If you remove entities such as Matrix and Frame, you have the older > MLContext API. Perhaps users who don't wish to use objects such as Matrix > and Frame can use the older API since these suggestions are already built > into the old API? > > Deron > > > On Mon, Sep 12, 2016 at 1:22 PM, Mike Dusenberry <[email protected]> > wrote: > > > I also agree that internal data structures shouldn't be exposed to a > user. > > However, I think we definitely need to keep the `Matrix` and `Frame` > types > > in the API, in agreement with Arvind. The main purpose of SystemML for a > > user is to allow for machine learning algorithms involving matrices to be > > run on a given system (laptop, Spark cluster, etc.). Anything involving > a > > compilation chain directly is noise for our ML users. Thus it's quite > > useful for SystemML to expose a `Matrix` type with a limited API as is > > currently done in MLContext. This allows a user to interact with > SystemML > > via these `Matrix` objects which abstractly represent the core data > > structure of a SystemML script. Furthermore, these Matrix objects can be > > used as subsequent input to an additional script, or can be converted to > a > > DataFrame once the user is ready to continue interacting with Spark. As > > Arvind mentioned, this just allows the DML `Matrix` type to be > effectively > > exposed at the API level as well. Additionally, we plan to unify this > > `Matrix` type with the lazy matrix types we are creating in the Python > and > > Scala DSLs, thus allowing `Matrix` to be the equivalent of matrices in > > DML. The similar argument exists for `Frame` as well. > > > > I think that limiting the exposure of internal structures to users could > be > > useful, but removing `Matrix` & `Frame` and instead having a user deal > > directly with compilation chains would be a step backwards. > > > > - Mike > > > > -- > > > > Michael W. Dusenberry > > GitHub: github.com/dusenberrymw > > LinkedIn: linkedin.com/in/mikedusenberry > > > > On Sun, Sep 11, 2016 at 5:52 PM, Acs S <[email protected]> wrote: > > > > > Yes, I agree that we should NOT expose any internal objects at API > > > level.Objects like FrameObject, MatrixObject should not be exposed as > > those > > > are internal objects. > > > Rule of thumb should be if object (Frame, Object or Scalar) is exposed > at > > > DML level it should be exposed at MlContext level.If there is need to > > > add anything extra object besides being exposed in DML it should be > > > justifiable with rationale. > > > I have introduced FrameObject as oversight. It should have been private > > > method instead of public method. I can fix it soon. But there are more > > > changes you have proposed I will let Deron to respond. > > > Thanks for catching these issues. > > > -Arvind > > > > > > From: Matthias Boehm <[email protected]> > > > To: dev <[email protected]> > > > Sent: Sunday, September 11, 2016 9:43 AM > > > Subject: Simplification of MLContext and related APIs > > > > > > > > > > > > It's great to see the ongoing progress on MLContext and related APIs. > > > However, one aspect that really concerns me is the creation of many > > > redundant data types and exposition of various internal data > structures. > > > For example, exposing MatrixObject and FrameObject at API level is > > > dangerous because it makes external programs data-dependent on internal > > > structures that might be subject to change (no API stability) and users > > > might not be aware of the implications their interactions have on the > > > buffer pool etc. Furthermore, having such a plethora of entry points > > makes > > > it very hard to ensure consistency of the compilation chain with regard > > to > > > configuration handling, environment setup and advanced compilation > > > techniques. > > > > > > I would recommend to create a holistic design across the various APIs > > that > > > aims to (1) reduce the number of exposed data types (for instance, I > > would > > > like to remove MatrixObject/FrameObject from the external interface, as > > > well as remove BinaryBlockMatrix, BinaryBlockFrame, Matrix, Frame, and > > > related meta data objects), and (2) create a configurable compilation > > chain > > > that is invoked from all external APIs. I understand that these data > > types > > > were introduced to simplify, for example, imports in user programs but > > I'm > > > sure we find an alternative realization with less redundancy. What do > you > > > think? > > > > > > Regards, > > > Matthias > > > > > > > > > > > > > >
