Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Niketan Pansare
Yes. As an example, one possible integration point is
org.apache.sysml.api.mlcontext.Matrix and we add following methods to it:

def +(Matrix: that) = do lazy logic (as done in current Python DSL)
def add(Matrix: that) = this + that


Then like MLContext, python matrix class maps one-to-one with this class
and
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L536
 will simply call the above method:
def __add__(self, other):
return matrix(self._jmatrix.add(other._jmatrix))

This way the semantics of 'matrix1 + matrix2' will be same in both Python
and Scal REPL (and in R when we get to it)

Again, I agree with Felix that it is a good idea to hold off on the DSL
integration until we are done with the parallelize construct.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Nakul Jindal 
To: dev@systemml.incubator.apache.org
Date:   09/28/2016 01:41 PM
Subject:    Re: Proof of Concept: Embedded Scala DSL



As I understand it, the way it is now is the following:

{ PyDML, DML }——> ANTLR AST (org.apache.sysml.parser.dml,
org.apache.sysml.parser.pydml) ——> Legacy AST (DMLProgram, Expression,
ForStatement…) ——> HOPS ——> LOPS ——> Runtime

Niketan’s embedded Python DSL ——> PyDML
Felix’s embedded Scala DSL——> DML

@Niketan, when you say “IR should be at abstraction to allow Python/R DSL
to be a thin layer”, do you mean something different than is already
implemented?




> On Sep 28, 2016, at 12:37 PM, Niketan Pansare  wrote:
>
> Hi Fred,
>
> I would consider DMLProgram as an internal AST, which could be created by
IR (or IR could just create DML). According to me, IR should be at
abstraction to allow Python/R DSL to be a thin layer. This would maximize
code reuse and minimize bugs between DSLs. Something that Felix suggested
(i.e. Matrix class) would work best.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar <
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
>
> Frederick R Reiss---09/28/2016 12:02:01 PM---Maybe I'm missing a subtle
point here, but why not refactor the existing class org.apache.sysml.pars
>
> From: Frederick R Reiss/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 12:02 PM
> Subject: Re: Proof of Concept: Embedded Scala DSL
>
>
>
>
> Maybe I'm missing a subtle point here, but why not refactor the existing
class org.apache.sysml.parser.DMLProgram into our common internal
representation across DSLs? This class is already sufficiently expressive
to represent any DML or PyDML program.
>
> Fred
>
> Niketan Pansare---09/28/2016 11:20:11 AM---Thanks Felix for the response.
+1
>
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 11:20 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
>
>
>
> Thanks Felix for the response.
>
> +1
> >> For the future design I will probably make the Matrix and Vector
classes
> abstract which allows for different concrete implementations. We could
> then have one that is backed directly by SystemML and works similar to
> the Python DSL in that it just uses mock operators and builds the DML
> string that is then executed using SystemML. That way the deep embedding
> would reuse the shallow embedding and we could offer the user to either
> use the lazy MatrixType on the Repl or write code inside the macro.
>
> Also, I agree that we can postpone the IR and integration of different
DSLs until the work on parallelize is completed.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar <
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
>
> fschueler---09/28/2016 10:54:37 AM---Hi Niketan, thanks for your
suggestions! I thought about it a bit and here are my
>
> From: fschue...@posteo.de
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 10:54 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
>
>
>
> Hi Niketan,
>
> thanks for your suggestions! I thought about it a bit and here are my
> ideas on it:
>
> The IR you are describing is basically already my user facing API. I am
> not sure how much sense it makes to have an IR that looks exactly like
> the API but with control structures renamed. A common IR for all DSLs
> definitely makes sense in general but I am not sure if it should be part
> of one particular DSL. For maintainability it might be bett

Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Nakul Jindal
As I understand it, the way it is now is the following:

{ PyDML, DML }——> ANTLR AST (org.apache.sysml.parser.dml, 
org.apache.sysml.parser.pydml) ——> Legacy AST (DMLProgram, Expression, 
ForStatement…) ——> HOPS ——> LOPS ——> Runtime

Niketan’s embedded Python DSL ——> PyDML
Felix’s embedded Scala DSL——> DML

@Niketan, when you say “IR should be at abstraction to allow Python/R DSL to be 
a thin layer”, do you mean something different than is already implemented?




> On Sep 28, 2016, at 12:37 PM, Niketan Pansare  wrote:
> 
> Hi Fred,
> 
> I would consider DMLProgram as an internal AST, which could be created by IR 
> (or IR could just create DML). According to me, IR should be at abstraction 
> to allow Python/R DSL to be a thin layer. This would maximize code reuse and 
> minimize bugs between DSLs. Something that Felix suggested (i.e. Matrix 
> class) would work best.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar 
> <http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
> 
> Frederick R Reiss---09/28/2016 12:02:01 PM---Maybe I'm missing a subtle point 
> here, but why not refactor the existing class org.apache.sysml.pars
> 
> From: Frederick R Reiss/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 12:02 PM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> 
> Maybe I'm missing a subtle point here, but why not refactor the existing 
> class org.apache.sysml.parser.DMLProgram into our common internal 
> representation across DSLs? This class is already sufficiently expressive to 
> represent any DML or PyDML program.
> 
> Fred
> 
> Niketan Pansare---09/28/2016 11:20:11 AM---Thanks Felix for the response. +1
> 
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 11:20 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> Thanks Felix for the response.
> 
> +1 
> >> For the future design I will probably make the Matrix and Vector classes 
> abstract which allows for different concrete implementations. We could 
> then have one that is backed directly by SystemML and works similar to 
> the Python DSL in that it just uses mock operators and builds the DML 
> string that is then executed using SystemML. That way the deep embedding 
> would reuse the shallow embedding and we could offer the user to either 
> use the lazy MatrixType on the Repl or write code inside the macro.
> 
> Also, I agree that we can postpone the IR and integration of different DSLs 
> until the work on parallelize is completed.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar 
> <http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
> 
> fschueler---09/28/2016 10:54:37 AM---Hi Niketan, thanks for your suggestions! 
> I thought about it a bit and here are my
> 
> From: fschue...@posteo.de
> To: dev@systemml.incubator.apache.org
> Date: 09/28/2016 10:54 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> 
> 
> 
> Hi Niketan,
> 
> thanks for your suggestions! I thought about it a bit and here are my 
> ideas on it:
> 
> The IR you are describing is basically already my user facing API. I am 
> not sure how much sense it makes to have an IR that looks exactly like 
> the API but with control structures renamed. A common IR for all DSLs 
> definitely makes sense in general but I am not sure if it should be part 
> of one particular DSL. For maintainability it might be better to have 
> that IR somewhere on the SystemML side.
> 
> Apart from that and to what Matthias suggested, I thought about how to 
> make the DSL more suitable for using on the Repl and I think we can find 
> a good compromise. Currently my API is backed by breeze for rapid 
> prototyping where breeze just forces evaluation of every statement. For 
> the future design I will probably make the Matrix and Vector classes 
> abstract which allows for different concrete implementations. We could 
> then have one that is backed directly by SystemML and works similar to 
> the Python DSL in that it just uses mock operators and builds the DML 
> string that is then executed using SystemML. That way the deep embedding 
> would reuse the shallow embedding and we could offer the user to either 
> use the lazy MatrixType on the Repl or write code inside the macro.
> 
> I haven't started playing around with this idea but let me know what y

Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Niketan Pansare

Hi Fred,

I would consider DMLProgram as an internal AST, which could be created by
IR (or IR could just create DML). According to me, IR should be at
abstraction to allow Python/R DSL to be a thin layer. This would maximize
code reuse and minimize bugs between DSLs. Something that Felix suggested
(i.e. Matrix class) would work best.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Frederick R Reiss/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   09/28/2016 12:02 PM
Subject:Re: Proof of Concept: Embedded Scala DSL



Maybe I'm missing a subtle point here, but why not refactor the existing
class org.apache.sysml.parser.DMLProgram into our common internal
representation across DSLs? This class is already sufficiently expressive
to represent any DML or PyDML program.

Fred

Niketan Pansare---09/28/2016 11:20:11 AM---Thanks Felix for the response.
+1

From: Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 09/28/2016 11:20 AM
Subject: Re: Proof of Concept: Embedded Scala DSL



Thanks Felix for the response.

+1
>> For the future design I will probably make the Matrix and Vector classes

abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

Also, I agree that we can postpone the IR and integration of different DSLs
until the work on parallelize is completed.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

fschueler---09/28/2016 10:54:37 AM---Hi Niketan, thanks for your
suggestions! I thought about it a bit and here are my

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/28/2016 10:54 AM
Subject: Re: Proof of Concept: Embedded Scala DSL



Hi Niketan,

thanks for your suggestions! I thought about it a bit and here are my
ideas on it:

The IR you are describing is basically already my user facing API. I am
not sure how much sense it makes to have an IR that looks exactly like
the API but with control structures renamed. A common IR for all DSLs
definitely makes sense in general but I am not sure if it should be part
of one particular DSL. For maintainability it might be better to have
that IR somewhere on the SystemML side.

Apart from that and to what Matthias suggested, I thought about how to
make the DSL more suitable for using on the Repl and I think we can find
a good compromise. Currently my API is backed by breeze for rapid
prototyping where breeze just forces evaluation of every statement. For
the future design I will probably make the Matrix and Vector classes
abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

I haven't started playing around with this idea but let me know what you
think of it. The lazy, shallow DSL would basically do what you would
want from a seperate IR, but i don't know if you want to call that from
the python DSL.

Felix

Am 24.09.2016 19:39 schrieb Niketan Pansare:
> Hi Felix,
>
> Thanks for the summary. The document is extremely useful. I
> particularly like the idea of parallelizing the code with 'breeze'
> library. I would like to pitch in few ideas which would enable your
> code to be reused by other DSLs:
> 1. Scala DSL/parallelize macro remains the same as described in your
> documentation, but instead of generating DML directly, we call an
> intermediate representation (IR). This IR then generates DML (instead
> of generating DML directly by parallelize). This IR will be then
> reused by Python DSL and R DSL.
> 2. As an example, IR could be a lazy Matrix class (which would be part
> of SystemML). It could have awkward syntax/mechanism for pushing down
> control structures for example: beginWhile and endWhile. Since IR will
> not be exposed to the end-user, it should be fine.
>
> Example:
>
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537

> [1] will call IR's add() method. At the end of parallelize or when the
> user wants result (i.e. eval() ), IR could generate DML code and
> execute it.
&g

Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Frederick R Reiss

Maybe I'm missing a subtle point here, but why not refactor the existing
class org.apache.sysml.parser.DMLProgram into our common internal
representation across DSLs? This class is already sufficiently expressive
to represent any DML or PyDML program.

Fred



From:   Niketan Pansare/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   09/28/2016 11:20 AM
Subject:        Re: Proof of Concept: Embedded Scala DSL



Thanks Felix for the response.

+1
>> For the future design I will probably make the Matrix and Vector classes

abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

Also, I agree that we can postpone the IR and integration of different DSLs
until the work on parallelize is completed.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar

fschueler---09/28/2016 10:54:37 AM---Hi Niketan, thanks for your
suggestions! I thought about it a bit and here are my

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/28/2016 10:54 AM
Subject: Re: Proof of Concept: Embedded Scala DSL



Hi Niketan,

thanks for your suggestions! I thought about it a bit and here are my
ideas on it:

The IR you are describing is basically already my user facing API. I am
not sure how much sense it makes to have an IR that looks exactly like
the API but with control structures renamed. A common IR for all DSLs
definitely makes sense in general but I am not sure if it should be part
of one particular DSL. For maintainability it might be better to have
that IR somewhere on the SystemML side.

Apart from that and to what Matthias suggested, I thought about how to
make the DSL more suitable for using on the Repl and I think we can find
a good compromise. Currently my API is backed by breeze for rapid
prototyping where breeze just forces evaluation of every statement. For
the future design I will probably make the Matrix and Vector classes
abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

I haven't started playing around with this idea but let me know what you
think of it. The lazy, shallow DSL would basically do what you would
want from a seperate IR, but i don't know if you want to call that from
the python DSL.

Felix

Am 24.09.2016 19:39 schrieb Niketan Pansare:
> Hi Felix,
>
> Thanks for the summary. The document is extremely useful. I
> particularly like the idea of parallelizing the code with 'breeze'
> library. I would like to pitch in few ideas which would enable your
> code to be reused by other DSLs:
> 1. Scala DSL/parallelize macro remains the same as described in your
> documentation, but instead of generating DML directly, we call an
> intermediate representation (IR). This IR then generates DML (instead
> of generating DML directly by parallelize). This IR will be then
> reused by Python DSL and R DSL.
> 2. As an example, IR could be a lazy Matrix class (which would be part
> of SystemML). It could have awkward syntax/mechanism for pushing down
> control structures for example: beginWhile and endWhile. Since IR will
> not be exposed to the end-user, it should be fine.
>
> Example:
>
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537

> [1] will call IR's add() method. At the end of parallelize or when the
> user wants result (i.e. eval() ), IR could generate DML code and
> execute it.
>
> Again, this is just a proposal and am fine dropping the idea of
> integrating different DSL if it makes the implementation of Scala DSL
> complicated. Also, please feel free to correct me if I am missing
> anything.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> [2]
>
> Matthias Boehm---09/24/2016 01:11:36 AM---thanks for sharing the
> summary - this is very nice. While looking over the example, I had the
> follow
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/24/2016 01:11 AM
> Subjec

Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread Niketan Pansare

Thanks Felix for the response.

+1
>> For the future design I will probably make the Matrix and Vector classes

abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

Also, I agree that we can postpone the IR and integration of different DSLs
until the work on parallelize is completed.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date:   09/28/2016 10:54 AM
Subject:        Re: Proof of Concept: Embedded Scala DSL



Hi Niketan,

thanks for your suggestions! I thought about it a bit and here are my
ideas on it:

The IR you are describing is basically already my user facing API. I am
not sure how much sense it makes to have an IR that looks exactly like
the API but with control structures renamed. A common IR for all DSLs
definitely makes sense in general but I am not sure if it should be part
of one particular DSL. For maintainability it might be better to have
that IR somewhere on the SystemML side.

Apart from that and to what Matthias suggested, I thought about how to
make the DSL more suitable for using on the Repl and I think we can find
a good compromise. Currently my API is backed by breeze for rapid
prototyping where breeze just forces evaluation of every statement. For
the future design I will probably make the Matrix and Vector classes
abstract which allows for different concrete implementations. We could
then have one that is backed directly by SystemML and works similar to
the Python DSL in that it just uses mock operators and builds the DML
string that is then executed using SystemML. That way the deep embedding
would reuse the shallow embedding and we could offer the user to either
use the lazy MatrixType on the Repl or write code inside the macro.

I haven't started playing around with this idea but let me know what you
think of it. The lazy, shallow DSL would basically do what you would
want from a seperate IR, but i don't know if you want to call that from
the python DSL.

Felix

Am 24.09.2016 19:39 schrieb Niketan Pansare:
> Hi Felix,
>
> Thanks for the summary. The document is extremely useful. I
> particularly like the idea of parallelizing the code with 'breeze'
> library. I would like to pitch in few ideas which would enable your
> code to be reused by other DSLs:
> 1. Scala DSL/parallelize macro remains the same as described in your
> documentation, but instead of generating DML directly, we call an
> intermediate representation (IR). This IR then generates DML (instead
> of generating DML directly by parallelize). This IR will be then
> reused by Python DSL and R DSL.
> 2. As an example, IR could be a lazy Matrix class (which would be part
> of SystemML). It could have awkward syntax/mechanism for pushing down
> control structures for example: beginWhile and endWhile. Since IR will
> not be exposed to the end-user, it should be fine.
>
> Example:
>
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537

> [1] will call IR's add() method. At the end of parallelize or when the
> user wants result (i.e. eval() ), IR could generate DML code and
> execute it.
>
> Again, this is just a proposal and am fine dropping the idea of
> integrating different DSL if it makes the implementation of Scala DSL
> complicated. Also, please feel free to correct me if I am missing
> anything.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> [2]
>
> Matthias Boehm---09/24/2016 01:11:36 AM---thanks for sharing the
> summary - this is very nice. While looking over the example, I had the
> follow
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/24/2016 01:11 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
>
> -
>
> thanks for sharing the summary - this is very nice. While looking over
> the example, I had the following questions:
>
> 1) Output handling: It would be great to see an example how the
> results of Algorithm.execute() are consumed. Do you intend to hand out
> our binary matrix representation or MLContext's Matrix from which the
> user then requests specific output formats? Also if there are multiple
> Algorithm instances, 

Re: Proof of Concept: Embedded Scala DSL

2016-09-28 Thread fschueler

Hi Niketan,

thanks for your suggestions! I thought about it a bit and here are my 
ideas on it:


The IR you are describing is basically already my user facing API. I am 
not sure how much sense it makes to have an IR that looks exactly like 
the API but with control structures renamed. A common IR for all DSLs 
definitely makes sense in general but I am not sure if it should be part 
of one particular DSL. For maintainability it might be better to have 
that IR somewhere on the SystemML side.


Apart from that and to what Matthias suggested, I thought about how to 
make the DSL more suitable for using on the Repl and I think we can find 
a good compromise. Currently my API is backed by breeze for rapid 
prototyping where breeze just forces evaluation of every statement. For 
the future design I will probably make the Matrix and Vector classes 
abstract which allows for different concrete implementations. We could 
then have one that is backed directly by SystemML and works similar to 
the Python DSL in that it just uses mock operators and builds the DML 
string that is then executed using SystemML. That way the deep embedding 
would reuse the shallow embedding and we could offer the user to either 
use the lazy MatrixType on the Repl or write code inside the macro.


I haven't started playing around with this idea but let me know what you 
think of it. The lazy, shallow DSL would basically do what you would 
want from a seperate IR, but i don't know if you want to call that from 
the python DSL.


Felix

Am 24.09.2016 19:39 schrieb Niketan Pansare:

Hi Felix,

Thanks for the summary. The document is extremely useful. I
particularly like the idea of parallelizing the code with 'breeze'
library. I would like to pitch in few ideas which would enable your
code to be reused by other DSLs:
1. Scala DSL/parallelize macro remains the same as described in your
documentation, but instead of generating DML directly, we call an
intermediate representation (IR). This IR then generates DML (instead
of generating DML directly by parallelize). This IR will be then
reused by Python DSL and R DSL.
2. As an example, IR could be a lazy Matrix class (which would be part
of SystemML). It could have awkward syntax/mechanism for pushing down
control structures for example: beginWhile and endWhile. Since IR will
not be exposed to the end-user, it should be fine.

Example:
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537
[1] will call IR's add() method. At the end of parallelize or when the
user wants result (i.e. eval() ), IR could generate DML code and
execute it.

Again, this is just a proposal and am fine dropping the idea of
integrating different DSL if it makes the implementation of Scala DSL
complicated. Also, please feel free to correct me if I am missing
anything.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
[2]

Matthias Boehm---09/24/2016 01:11:36 AM---thanks for sharing the
summary - this is very nice. While looking over the example, I had the
follow

From: Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 09/24/2016 01:11 AM
Subject: Re: Proof of Concept: Embedded Scala DSL

-

thanks for sharing the summary - this is very nice. While looking over
the example, I had the following questions:

1) Output handling: It would be great to see an example how the
results of Algorithm.execute() are consumed. Do you intend to hand out
our binary matrix representation or MLContext's Matrix from which the
user then requests specific output formats? Also if there are multiple
Algorithm instances, how is the MLContext (with its internal state of
lazily evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations
that are not supported in breeze? Examples are removeEmpty, table,
aggregate, rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN
operations?

3) Frame data type and operations: Do you also intend to add a frame
type and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.

Regards,
Matthias

fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
(SYSTEMML-451) I have started to implement a prototype/proof of co

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/23/2016 04:36 PM
Subject: Proof of Concept: Embedded Scala DSL

-

As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you
can
find on github together with the code:
https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md
[3]
Please note that current development happens in the Em

Re: Proof of Concept: Embedded Scala DSL

2016-09-26 Thread fschueler

Hi Matthias,

thanks for taking a look at the document!
Let me try to answer your questions with some ideas - part of this POC 
and my current work is to find out what the best answers are!


1) I see basically two usecases for this DSL:
- users write functions/algorithms much like prepared statements in 
SQL (defining functions `def fun(a: T, b: U) = parallelize { ... }` and 
executing them later)
- users interactively submit snippets to SystemML (using `val A = 
parallelize { C %*% D } execute()` and directly executing)
In general, we should probably offer a write() primitive like in DML 
that persists the data on the filesystem. In the second case it's not 
quite clear to me what would be the best option right now. Intuitively I 
would want the result to be of the same type that my initial DSL 
expression was. If I multiply two matrices for example, I would want a 
Matrix (DSL Type) as a result. Ideally, I would not have to care about 
what underlying representation the actual matrix has and could just use 
the result in my next statement/function until I would want to pass the 
result somewhere else (persist it, transform it into a spark dataframe 
etc.). Given that right now the Algorithm.execute() would take the 
generated DML string and execute it using the MLContext, we would be 
free to return anything that the context can return - or wrap it in the 
DSL Matrix type. I am happy to discuss what would be best here!


For reusing the MLContext, I suggest using a global context that is held 
via a lazy variable in the api package object that is imoprted when 
using the DSL. The run method would get an implicit argument of type 
MLContext and the user would not have to take care of passing it. The 
laziness will help reusing it.


2) I think it should be possible to formulate semantically equivalent 
operations using breeze - the question is if the maintenance and 
implementation of two operational APIs makes sense and is feasible. The 
breeze rapid prototyping would be very nice IMO but probably shouldn't 
become a major source of work. As for the DNN operations - we could 
probably find a way of wrapping those, too - but I don't really think it 
makes sense and we might think about how we want to offer DML libraries 
in our DSLs in general. Apart from that, it seems like it is possible to 
call java functions directly from DML - this might be an interesting 
aspect to keep in mind for UDFs.


3) A frame datatype should definitely be part of the DSL and would 
probably work very similar to the Matrix abstraction. Right now I am 
working with matrices to figure out how a good way to use the DSL would 
look like. Apart from the general goal and idea of an embedded DSL, this 
includes figuring out what is possible in DML (and SystemML in general). 
The goal should be a DSL that allows for full support of all DML 
features (possibly even more).


I hope this clarifies some of your questions and I will send updates on 
the progress and update the document as I go.


Thanks!
Felix

Am 24.09.2016 10:11 schrieb Matthias Boehm:

thanks for sharing the summary - this is very nice. While looking over
the example, I had the following questions:

1) Output handling: It would be great to see an example how the
results of Algorithm.execute() are consumed. Do you intend to hand out
our binary matrix representation or MLContext's Matrix from which the
user then requests specific output formats? Also if there are multiple
Algorithm instances, how is the MLContext (with its internal state of
lazily evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations
that are not supported in breeze? Examples are removeEmpty, table,
aggregate, rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN
operations?

3) Frame data type and operations: Do you also intend to add a frame
type and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.

Regards,
Matthias

fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
(SYSTEMML-451) I have started to implement a prototype/proof of co

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/23/2016 04:36 PM
Subject: Proof of Concept: Embedded Scala DSL

-

As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you
can
find on github together with the code:
https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md
[1]
Please note that current development happens in the Emma project but
will move to an independent module in the SystemML project once the
necessary additions to Emma are merged. By having the DSL in a
separate
module, we can include Scala and Emma dependencies only for the users
that actually want to use the Scala DSL.

The current code serve

Re: Proof of Concept: Embedded Scala DSL

2016-09-26 Thread Nakul Jindal
Hi Felix,

This is very good work.
I've played around a bit with Niketan's Python based internal/embedded DSL.
It seems like its meant for interactive work, as if in a notebook or a
REPL.
This work on the other hand could look similar to the OpenMP/OpenACC
paradigm. In its current form and the one you are suggesting with the
Algorithm instance, the user is responsible for "executing" the
"parallelized" snippet of code.

Maybe we could have it look like OpenMP/OpenACC, like so-
If my code looked like this:

/* Setup   */
for ( a <- 1 to 1) { /* Expensive Computation */ }
/* Cleanup  */


I could change it to

/* Setup   */
parallelize {
for ( a <- 1 to 1) { /* Expensive Computation */ }
}
/* Cleanup  */

The code in "parallelize" would be DML-ized and sent to SystemML. The
appropriate conversions between data types in scala and those supported by
SystemML would happen automatically.

Thoughts?



-Nakul







On Sat, Sep 24, 2016 at 10:39 AM, Niketan Pansare 
wrote:

> Hi Felix,
>
> Thanks for the summary. The document is extremely useful. I particularly
> like the idea of parallelizing the code with 'breeze' library. I would like
> to pitch in few ideas which would enable your code to be reused by other
> DSLs:
> 1. Scala DSL/parallelize macro remains the same as described in your
> documentation, but instead of generating DML directly, we call an
> intermediate representation (IR). This IR then generates DML (instead of
> generating DML directly by parallelize). This IR will be then reused by
> Python DSL and R DSL.
> 2. As an example, IR could be a lazy Matrix class (which would be part of
> SystemML). It could have awkward syntax/mechanism for pushing down control
> structures for example: beginWhile and endWhile. Since IR will not be
> exposed to the end-user, it should be fine.
>
> Example:
>
> *https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537*
> <https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537>
> will call IR's add() method. At the end of parallelize or when the user
> wants result (i.e. eval() ), IR could generate DML code and execute it.
>
> Again, this is just a proposal and am fine dropping the idea of
> integrating different DSL if it makes the implementation of Scala DSL
> complicated. Also, please feel free to correct me if I am missing anything.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Matthias Boehm---09/24/2016 01:11:36
> AM---thanks for sharing the summary - this is very nice. While l]Matthias
> Boehm---09/24/2016 01:11:36 AM---thanks for sharing the summary - this is
> very nice. While looking over the example, I had the follow
>
> From: Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 09/24/2016 01:11 AM
> Subject: Re: Proof of Concept: Embedded Scala DSL
> --
>
>
>
> thanks for sharing the summary - this is very nice. While looking over the
> example, I had the following questions:
>
> 1) Output handling: It would be great to see an example how the results of
> Algorithm.execute() are consumed. Do you intend to hand out our binary
> matrix representation or MLContext's Matrix from which the user then
> requests specific output formats? Also if there are multiple Algorithm
> instances, how is the MLContext (with its internal state of lazily
> evaluated intermediates) reused?
>
> 2) Scala-breeze prototyping: How do you intend to support operations that
> are not supported in breeze? Examples are removeEmpty, table, aggregate,
> rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN operations?
>
> 3) Frame data type and operations: Do you also intend to add a frame type
> and its operations? I think for this initial prototype it is not
> necessarily required but please make the scope explicit.
>
> Regards,
> Matthias
>
>
> fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
> (SYSTEMML-451) I have started to implement a prototype/proof of co
>
> From: fschue...@posteo.de
> To: dev@systemml.incubator.apache.org
> Date: 09/23/2016 04:36 PM
> Subject: Proof of Concept: Embedded Scala DSL
> --
>
>
>
> As discussed in the related Jira (SYSTEMML-451) I have started to
> implement a prototype/proof of concept for an embedded DSL in Scala.
>
> I have summarized the current approach in a short document that you can
> find on github together with the code:
> *https://github.

Re: Proof of Concept: Embedded Scala DSL

2016-09-24 Thread Niketan Pansare

Hi Felix,

Thanks for the summary. The document is extremely useful. I particularly
like the idea of parallelizing the code with 'breeze' library. I would like
to pitch in few ideas which would enable your code to be reused by other
DSLs:
1. Scala DSL/parallelize macro remains the same as described in your
documentation, but instead of generating DML directly, we call an
intermediate representation (IR). This IR then generates DML (instead of
generating DML directly by parallelize). This IR will be then reused by
Python DSL and R DSL.
2. As an example, IR could be a lazy Matrix class (which would be part of
SystemML). It could have awkward syntax/mechanism for pushing down control
structures for example: beginWhile and endWhile. Since IR will not be
exposed to the end-user, it should be fine.

Example:
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537
 will call IR's add() method. At the end of parallelize or when the user
wants result (i.e. eval() ), IR could generate DML code and execute it.

Again, this is just a proposal and am fine dropping the idea of integrating
different DSL if it makes the implementation of Scala DSL complicated.
Also, please feel free to correct me if I am missing anything.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:   Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date:   09/24/2016 01:11 AM
Subject:        Re: Proof of Concept: Embedded Scala DSL



thanks for sharing the summary - this is very nice. While looking over the
example, I had the following questions:

1) Output handling: It would be great to see an example how the results of
Algorithm.execute() are consumed. Do you intend to hand out our binary
matrix representation or MLContext's Matrix from which the user then
requests specific output formats? Also if there are multiple Algorithm
instances, how is the MLContext (with its internal state of lazily
evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations that
are not supported in breeze? Examples are removeEmpty, table, aggregate,
rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN operations?

3) Frame data type and operations: Do you also intend to add a frame type
and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.

Regards,
Matthias


fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
(SYSTEMML-451) I have started to implement a prototype/proof of co

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/23/2016 04:36 PM
Subject: Proof of Concept: Embedded Scala DSL



As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you can
find on github together with the code:
https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md
Please note that current development happens in the Emma project but
will move to an independent module in the SystemML project once the
necessary additions to Emma are merged. By having the DSL in a separate
module, we can include Scala and Emma dependencies only for the users
that actually want to use the Scala DSL.

The current code serves as a proof of concept to discuss further
development with the SystemML community. I especially welcome input from
SystemML Scala users on the usability of the API design.
Next steps will include the translation from Scala code to DML with
support of all features currently supported in DML, including control
flow structures.
Also, a coherent way of executing the generated scripts from Scala and
the interaction with outside data formats (such as Spark Dataframes)
will be integrated.

I am happy to answer your questions and discuss the described approach
here!

Felix






Re: Proof of Concept: Embedded Scala DSL

2016-09-24 Thread Matthias Boehm

thanks for sharing the summary - this is very nice. While looking over the
example, I had the following questions:

1) Output handling: It would be great to see an example how the results of
Algorithm.execute() are consumed. Do you intend to hand out our binary
matrix representation or MLContext's Matrix from which the user then
requests specific output formats? Also if there are multiple Algorithm
instances, how is the MLContext (with its internal state of lazily
evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations that
are not supported in breeze? Examples are removeEmpty, table, aggregate,
rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN operations?

3) Frame data type and operations: Do you also intend to add a frame type
and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.

Regards,
Matthias




From:   fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date:   09/23/2016 04:36 PM
Subject:Proof of Concept: Embedded Scala DSL



As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you can
find on github together with the code:
https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md
Please note that current development happens in the Emma project but
will move to an independent module in the SystemML project once the
necessary additions to Emma are merged. By having the DSL in a separate
module, we can include Scala and Emma dependencies only for the users
that actually want to use the Scala DSL.

The current code serves as a proof of concept to discuss further
development with the SystemML community. I especially welcome input from
SystemML Scala users on the usability of the API design.
Next steps will include the translation from Scala code to DML with
support of all features currently supported in DML, including control
flow structures.
Also, a coherent way of executing the generated scripts from Scala and
the interaction with outside data formats (such as Spark Dataframes)
will be integrated.

I am happy to answer your questions and discuss the described approach
here!

Felix