Hi there, I learned about SystemML and its optimizer from the recent SPOOF paper <http://cidrdb.org/cidr2017/papers/p3-elgamal-cidr17.pdf>. The gist I absorbed is that SystemML translates linear algebra expressions given by its DML to relational algebra, then applies standard relational algebra optimizations, and then re-recognizes the result in linear algebra kernels, with an attempt to fuse them.
I think I found the SystemML rewrite rules here <https://github.com/apache/incubator-systemml/tree/master/src/main/java/org/apache/sysml/hops/rewrite>. A couple questions: 1. It appears that SystemML rewrites HOP expressions destructively, i.e., by throwing away the old expression. In this case, how does SystemML determine the order of rewrites to apply? Where does cost-based optimization come into play? 2. Is there a way to "debug/visualize" the optimization process? That is, when I start with a DML program, can I view (a) the DML program parsed into HOPs; (b) what rules fire and where in the plan, as well as the plan after each rule fires; and (c) the lowering and fusing of operators to LOPs? I know this is a lot to ask for; I'm curious how far SystemML has gone in this direction. 3. Is there any relationship between the SystemML optimizer and Apache Calcite <https://calcite.apache.org/>? If not, I'd love to understand the design decisions that differentiate the two. Thanks, Dylan Hutchison
