Hi Matthias, Would you be so kind as to announce the following: 1. Apache Infra jira ticket for name change 2. new committers (welcome!) and of course contributors. 3. New release version number (is it SYSTEMDS-0.3.0-SNAPSHOT)
Thank you, Janardhan On Tue, Mar 24, 2020 at 6:28 PM Matthias Boehm <mboe...@gmail.com> wrote: > that's a good point Henry. Yes, with SystemDS 0.1.0, we removed the > MapReduce compiler and runtime backend, the pydml parser and language > support, the Java-UDF framework, and the script-level debugger. We are > concentrating on local, spark, GPU, and federated backends now, added > new language bindings including an initial Python binding. However, the > script-level operation support remains intact and is even largely > extended by builtins for algorithms, data cleaning, and debugging. > > Accordingly, it might be good to deprecate the removed things while > merging the code in and then make the next Apache SystemDS (pending > approval) release a major release which allows us to break external APIs. > > Regards, > Matthias > > On 3/24/2020 2:07 AM, Henry Saputra wrote: > > Thanks for starting this discussions, Matthias. > > > > Are there any features from SystemML that could be be removed or > deprecated > > when SystemDS being merged to SystemML repository? > > > > - Henry > > > > On Sat, Mar 21, 2020 at 2:47 PM Matthias Boehm <mboe...@gmail.com> > wrote: > > > >> just FYI, we created a ticket for the suitable name search, and shared > >> the related results [1]. So from my perspective, it really boils down to > >> the question if we accept the closeness to 'Linux systemd'. Back in 2018 > >> (when starting SystemDS), I came to the conclusion that it's fine > >> because of the very different objectives and because SystemDS reflects > >> both the origin from SystemML and its new focus on data science > pipelines. > >> > >> [1] > >> > >> > https://issues.apache.org/jira/projects/PODLINGNAMESEARCH/issues/PODLINGNAMESEARCH-179?filter=allissues > >> > >> Regards, > >> Matthias > >> > >> On 3/9/2020 6:37 PM, Matthias Boehm wrote: > >>> Hi all, > >>> > >>> as you're probably aware, development activities of Apache SystemML > >>> significantly slowed down and were virtually non-existing in the last > >>> year for various reasons. Part of that was that my team and I [1] > >>> decided to start SystemDS [2,3] as a fork of SystemML in 09/2018 with a > >>> new vision and roadmap for the future. > >>> > >>> During PMC discussions regarding the retirement of SystemML, we came to > >>> the conclusions that the best path forward -- for the entire community > >>> -- would be to merge SystemDS back into Apache SystemML, rename it to > >>> SystemDS, and continue jointly. Before doing so, I want to share the > >>> plan with the entire community. > >>> > >>> SystemDS aims at providing better systems support for the end-to-end > >>> data science lifecycle, with a special focus on ML pipelines from data > >>> integration, cleaning, and preparation, over efficient ML model > >>> training, to model debugging and serving. A key observation is that > >>> state-of-the-art data integration and cleaning primitives are > themselves > >>> based on machine learning. Our main objectives are to support effective > >>> and efficient data preparation, ML training and debugging at scale, > >>> something that cannot be composed from existing libraries. The game > plan > >>> includes three major parts: > >>> > >>> 1) DSL-based, High-level Abstractions: We aim to provide a hierarchy of > >>> abstractions for the different lifecycle tasks as well as users with > >>> different expertise (ML researchers, data scientists, domain experts), > >>> based on our DSL for ML training and scoring. Exploratory data science > >>> interleaves data preparation, ML training, scoring, and debugging in an > >>> iterative process; and once these tasks are expressed in dense or > sparse > >>> linear algebra, we expect very good performance. > >>> > >>> 2) Hybrid Runtime Plans and Optimizing Compiler: To support the wide > >>> variety of algorithm classes, we will continue to provide different > >>> parallelization strategies, enriched by a new backend for federated ML > >>> and privacy enhancing technologies. Since the hierarchy of language > >>> abstractions inevitably leads to redundancy, we further aim to improve > >>> the automatic optimization capabilities of the compiler and underlying > >>> runtime. > >>> > >>> 3) Data Model - Heterogeneous Tensors: To support data integration and > >>> cleaning primitives in linear algebra programs requires a more generic > >>> data model for handling heterogeneous and structured data. In contrast > >>> to existing ML systems, our central data model are heterogeneous > >>> tensors. Thus, we generalize SystemML's FP64 matrices to > >>> multi-dimensional arrays where one dimension may have a schema > including > >>> JSON strings to represent nested data. > >>> > >>> Admin: We intend to create the SystemDS 0.2 release in March. > Afterwards > >>> we would then rebase all our commits (369) back onto the SystemML > >>> codeline. Subsequently, we will rename Apache SystemML to Apache > >>> SystemDS and continue our development under Apache umbrella. I just > went > >>> through the Apache name search guidelines and we'll perform a 'suitable > >>> name search' accordingly and then transfer SystemDS. The existing PMC > >>> and committer status stays of course intact unless people want to > leave. > >>> Shortly after the merge, I will nominate the four most active > >>> contributors of the last year to become committers. Regarding releases > >>> (and JIRA numbers), it's up for discussion but both, continuing with > >>> SystemML versions (i.e., 1.3) or SystemDS versions (0.3) seem fine to > me. > >>> > >>> Roadmap: At technical level, SystemDS will continue to support all > >>> operations and algorithms SystemML provided but significantly extent > the > >>> scope and functionality via the mentioned hierarchy of language > >>> abstractions (in form of builtin functions). However, during the fork > we > >>> already removed old baggage like the MR backend, the scrip-level > >>> debugger, the PyDML frontend and several other things [4]. Major new > >>> internals are native support for lineage tracing and reuse, the data > >>> model of heterogeneous tensors, and a new federated backend. > >>> > >>> [1] https://damslab.github.io/ > >>> [2] https://github.com/tugraz-isds/systemds > >>> [3] http://cidrdb.org/cidr2020/papers/p22-boehm-cidr20.pdf > >>> [4] https://github.com/tugraz-isds/systemds/releases/tag/v0.1.0 > >>> > >>> Regards, > >>> Matthias > >> > > >