Hi Niketan,

thanks for your suggestions! I thought about it a bit and here are my ideas on it:

The IR you are describing is basically already my user facing API. I am not sure how much sense it makes to have an IR that looks exactly like the API but with control structures renamed. A common IR for all DSLs definitely makes sense in general but I am not sure if it should be part of one particular DSL. For maintainability it might be better to have that IR somewhere on the SystemML side.

Apart from that and to what Matthias suggested, I thought about how to make the DSL more suitable for using on the Repl and I think we can find a good compromise. Currently my API is backed by breeze for rapid prototyping where breeze just forces evaluation of every statement. For the future design I will probably make the Matrix and Vector classes abstract which allows for different concrete implementations. We could then have one that is backed directly by SystemML and works similar to the Python DSL in that it just uses mock operators and builds the DML string that is then executed using SystemML. That way the deep embedding would reuse the shallow embedding and we could offer the user to either use the lazy MatrixType on the Repl or write code inside the macro.

I haven't started playing around with this idea but let me know what you think of it. The lazy, shallow DSL would basically do what you would want from a seperate IR, but i don't know if you want to call that from the python DSL.

Felix

Am 24.09.2016 19:39 schrieb Niketan Pansare:
Hi Felix,

Thanks for the summary. The document is extremely useful. I
particularly like the idea of parallelizing the code with 'breeze'
library. I would like to pitch in few ideas which would enable your
code to be reused by other DSLs:
1. Scala DSL/parallelize macro remains the same as described in your
documentation, but instead of generating DML directly, we call an
intermediate representation (IR). This IR then generates DML (instead
of generating DML directly by parallelize). This IR will be then
reused by Python DSL and R DSL.
2. As an example, IR could be a lazy Matrix class (which would be part
of SystemML). It could have awkward syntax/mechanism for pushing down
control structures for example: beginWhile and endWhile. Since IR will
not be exposed to the end-user, it should be fine.

Example:
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537
[1] will call IR's add() method. At the end of parallelize or when the
user wants result (i.e. eval() ), IR could generate DML code and
execute it.

Again, this is just a proposal and am fine dropping the idea of
integrating different DSL if it makes the implementation of Scala DSL
complicated. Also, please feel free to correct me if I am missing
anything.

Thanks,

Niketan Pansare
IBM Almaden Research Center
E-mail: npansar At us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
[2]

Matthias Boehm---09/24/2016 01:11:36 AM---thanks for sharing the
summary - this is very nice. While looking over the example, I had the
follow

From: Matthias Boehm/Almaden/IBM@IBMUS
To: dev@systemml.incubator.apache.org
Date: 09/24/2016 01:11 AM
Subject: Re: Proof of Concept: Embedded Scala DSL

-------------------------

thanks for sharing the summary - this is very nice. While looking over
the example, I had the following questions:

1) Output handling: It would be great to see an example how the
results of Algorithm.execute() are consumed. Do you intend to hand out
our binary matrix representation or MLContext's Matrix from which the
user then requests specific output formats? Also if there are multiple
Algorithm instances, how is the MLContext (with its internal state of
lazily evaluated intermediates) reused?

2) Scala-breeze prototyping: How do you intend to support operations
that are not supported in breeze? Examples are removeEmpty, table,
aggregate, rowIndexMax, quantile/centralmoment, cummin/cummax, and DNN
operations?

3) Frame data type and operations: Do you also intend to add a frame
type and its operations? I think for this initial prototype it is not
necessarily required but please make the scope explicit.

Regards,
Matthias

fschueler---09/23/2016 04:36:14 PM---As discussed in the related Jira
(SYSTEMML-451) I have started to implement a prototype/proof of co

From: fschue...@posteo.de
To: dev@systemml.incubator.apache.org
Date: 09/23/2016 04:36 PM
Subject: Proof of Concept: Embedded Scala DSL

-------------------------

As discussed in the related Jira (SYSTEMML-451) I have started to
implement a prototype/proof of concept for an embedded DSL in Scala.

I have summarized the current approach in a short document that you
can
find on github together with the code:
https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md
[3]
Please note that current development happens in the Emma project but
will move to an independent module in the SystemML project once the
necessary additions to Emma are merged. By having the DSL in a
separate
module, we can include Scala and Emma dependencies only for the users
that actually want to use the Scala DSL.

The current code serves as a proof of concept to discuss further
development with the SystemML community. I especially welcome input
from
SystemML Scala users on the usability of the API design.
Next steps will include the translation from Scala code to DML with
support of all features currently supported in DML, including control
flow structures.
Also, a coherent way of executing the generated scripts from Scala and

the interaction with outside data formats (such as Spark Dataframes)
will be integrated.

I am happy to answer your questions and discuss the described approach

here!

Felix



Links:
------
[1]
https://github.com/apache/incubator-systemml/blob/master/src/main/python/systemml/defmatrix.py#L537
[2] http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar [3] https://github.com/fschueler/emma/blob/sysml-dsl/emma-sysml-dsl/README.md

Reply via email to