Hi all, we are a group of researchers from the Database group (DIMA) at TU Berlin. We would like to add Apache Flink as an execution backend to SystemML in addition to Hadoop MR and Spark. To this end we started implementing a proof of concept consisting of several instructions together with the necessary de-/serialization and execution-logic. You can see the current state of our fork [1] including two test-cases showing what we currently support [2][3].
For our simple POC implementation we realized that we had to duplicate a lot of functionality (especially from spark instructions). We saw that people already raised concerns regarding the refactoring of the runtime package [4][5], potentially making it easier to integrate further backend-systems. Given that this would be a bigger change, it would be helpful to get some input from the SystemML community regarding this effort. In particular, we would like to discuss the following questions: * How should we deal with shared functionality between the different backends (Flink, Spark, etc.) to avoid code duplication, especially in instructions, but also introduce modularity? And is this modularization even desired? * How should we integrate Flink into the different runtime-modes? (Flink-only, Flink-Hybrid, etc.) * How should we structure the integration? (multiple/single commits) We’re looking forward to feedback and hope the community likes the idea of adding Flink as an execution backend to SystemML. Best, Andreas Kunft Christoph Brücke Felix Schüler [1] https://github.com/stratosphere/incubator-systemml/tree/flink-integration [2] https://github.com/stratosphere/incubator-systemml/blob/flink-integration/src/test/java/org/apache/sysml/runtime/instructions/flink/TsmmFLInstructionTest.java [3] https://github.com/stratosphere/incubator-systemml/blob/flink-integration/src/test/java/org/apache/sysml/runtime/instructions/flink/utils/DataSetConverterUtilsTest.java [4] https://issues.apache.org/jira/browse/SYSTEMML-33 [5] https://www.mail-archive.com/search?l=dev%40systemml.incubator.apache.org&q=subject%3A%22Runtime+package+refactoring%22&o=newest&f=1
