Re: Roadmap Merge and Rename SystemDS

Matthias Boehm Sat, 21 Mar 2020 14:47:09 -0700

just FYI, we created a ticket for the suitable name search, and sharedthe related results [1]. So from my perspective, it really boils down tothe question if we accept the closeness to 'Linux systemd'. Back in 2018(when starting SystemDS), I came to the conclusion that it's finebecause of the very different objectives and because SystemDS reflectsboth the origin from SystemML and its new focus on data science pipelines.

[1]https://issues.apache.org/jira/projects/PODLINGNAMESEARCH/issues/PODLINGNAMESEARCH-179?filter=allissues


Regards,
Matthias

On 3/9/2020 6:37 PM, Matthias Boehm wrote:

Hi all,
as you're probably aware, development activities of Apache SystemMLsignificantly slowed down and were virtually non-existing in the lastyear for various reasons. Part of that was that my team and I [1]decided to start SystemDS [2,3] as a fork of SystemML in 09/2018 with anew vision and roadmap for the future.
During PMC discussions regarding the retirement of SystemML, we came tothe conclusions that the best path forward -- for the entire community-- would be to merge SystemDS back into Apache SystemML, rename it toSystemDS, and continue jointly. Before doing so, I want to share theplan with the entire community.
SystemDS aims at providing better systems support for the end-to-enddata science lifecycle, with a special focus on ML pipelines from dataintegration, cleaning, and preparation, over efficient ML modeltraining, to model debugging and serving. A key observation is thatstate-of-the-art data integration and cleaning primitives are themselvesbased on machine learning. Our main objectives are to support effectiveand efficient data preparation, ML training and debugging at scale,something that cannot be composed from existing libraries. The game planincludes three major parts:
1) DSL-based, High-level Abstractions: We aim to provide a hierarchy ofabstractions for the different lifecycle tasks as well as users withdifferent expertise (ML researchers, data scientists, domain experts),based on our DSL for ML training and scoring. Exploratory data scienceinterleaves data preparation, ML training, scoring, and debugging in aniterative process; and once these tasks are expressed in dense or sparselinear algebra, we expect very good performance.
2) Hybrid Runtime Plans and Optimizing Compiler: To support the widevariety of algorithm classes, we will continue to provide differentparallelization strategies, enriched by a new backend for federated MLand privacy enhancing technologies. Since the hierarchy of languageabstractions inevitably leads to redundancy, we further aim to improvethe automatic optimization capabilities of the compiler and underlyingruntime.
3) Data Model - Heterogeneous Tensors: To support data integration andcleaning primitives in linear algebra programs requires a more genericdata model for handling heterogeneous and structured data. In contrastto existing ML systems, our central data model are heterogeneoustensors. Thus, we generalize SystemML's FP64 matrices tomulti-dimensional arrays where one dimension may have a schema includingJSON strings to represent nested data.
Admin: We intend to create the SystemDS 0.2 release in March. Afterwardswe would then rebase all our commits (369) back onto the SystemMLcodeline. Subsequently, we will rename Apache SystemML to ApacheSystemDS and continue our development under Apache umbrella. I just wentthrough the Apache name search guidelines and we'll perform a 'suitablename search' accordingly and then transfer SystemDS. The existing PMCand committer status stays of course intact unless people want to leave.Shortly after the merge, I will nominate the four most activecontributors of the last year to become committers. Regarding releases(and JIRA numbers), it's up for discussion but both, continuing withSystemML versions (i.e., 1.3) or SystemDS versions (0.3) seem fine to me.
Roadmap: At technical level, SystemDS will continue to support alloperations and algorithms SystemML provided but significantly extent thescope and functionality via the mentioned hierarchy of languageabstractions (in form of builtin functions). However, during the fork wealready removed old baggage like the MR backend, the scrip-leveldebugger, the PyDML frontend and several other things [4]. Major newinternals are native support for lineage tracing and reuse, the datamodel of heterogeneous tensors, and a new federated backend.
[1] https://damslab.github.io/
[2] https://github.com/tugraz-isds/systemds
[3] http://cidrdb.org/cidr2020/papers/p22-boehm-cidr20.pdf
[4] https://github.com/tugraz-isds/systemds/releases/tag/v0.1.0

Regards,
Matthias

Re: Roadmap Merge and Rename SystemDS

Reply via email to