Write Access to Incubator Wiki

Dominik Riemer Wed, 30 Oct 2019 13:17:49 -0700

Hi all,



a short introduction to myself: My name is Dominik Riemer and I'm a 
co-initiator of StreamPipes, an open source self-service toolbox for analyzing 
IoT data streams. After presenting the tool at this year's ApacheCon NA, 
followed by very friendly and fruitful discussions with many people from the 
Apache community, we are sure that we'd like continue the development of 
StreamPipes as an Apache community project. Before we start a discussion 
process, I'd like to ask to get write access to the Incubator wiki (username: 
riemer). An initial draft of the proposal is attached below.



Thanks for your help!

Dominik





--------------------------------



StreamPipes - Apache Incubator Proposal, Proposal Draft



== Abstract ==

StreamPipes is a self-service (Industrial) IoT toolbox to enable non-technical 
users to connect, analyze and explore (Industrial) IoT data streams.



= Proposal =



The goal of StreamPipes (www.streampipes.org<http://www.streampipes.org>) is to 
provide an easy-to-use toolbox for non-technical users, e.g., domain experts, 
to exploit data streams coming from (Industrial) IoT devices. Such users are 
provided with an intuitive graphical user interface with the Pipeline Editor at 
its core. Users are able to graphically model processing pipelines based on 
data sources (streams), data processors and data sinks. Data processors and 
sinks are self-contained microservices, which implement either stateful or 
stateless processing logic (e.g., a trend detection or image classifier). Their 
processing logic is implemented using one of several provided wrappers (we 
currently have wrappers for standalone/Edge-based processing, Apache Flink, 
Siddhi and working wrapper prototypes for Apache Kafka Streams and Spark, in 
the future we also plan to integrate with Apache Beam). An SDK allows to easily 
create new pipeline elements. Pipeline elements can be installed at runtime. To 
support users in creating pipelines, an underlying semantics-based data model 
enables pipeline elements to express requirements on incoming data streams that 
need to be fulfilled, thus reducing modeling errors.

Data streams are integrated by using StreamPipes Connect, which allows to 
connect data sources (based on standard protocols, such as MQTT, Kafka, Pulsar, 
OPC-UA and further PLC4X-supported protocols) without further programming using 
a graphical wizard. Additional user-faced modules of StreamPipes are a Live 
dashboard to quickly explore IoT data streams and a wizard that generates code 
templates for new pipeline elements, a Pipeline Element Installer used to 
extend the algorithm feature set at runtime.



=== Background ===

StreamPipes was started in 2014 by researchers from FZI Research Center for 
Information Technology in Karlsruhe, Germany. The original prototype was funded 
by an EU project centered around predictive analytics for the manufacturing 
domain. Since then, StreamPipes was constantly improved and extended by public 
funding mainly from federal German ministries. In early 2018, the source code 
was officially released under the Apache License 2.0. At the same time, while 
we focused on bringing the research prototype to a production-grade tool, the 
first companies started to use StreamPipes. Currently, the primary goal is to 
widen the user and developer base. At ApacheCon NA 2019, after having talked to 
many people from the Apache Community, we finally decided that we would like to 
bring StreamPipes to the Apache Incubator.



=== Rationale ===

The (Industrial) IoT domain is a highly relevant and emerging sector. 
Currently, IoT platforms are offered by many vendors ranging from SMEs up to 
large enterprises. We believe that open source alternatives are an important 
cornerstone for manufacturing companies to easily adopt data-driven decision 
making. From our point of view, StreamPipes fits very well into the existing 
(I)IoT ecosystem within the ASF, with projects such as Apache PLC4X focusing on 
connecting machine data from PLCs, or other tools we are also using either in 
the core of StreamPipes or with integrations (Apache Kafka, Apache IoTDB, 
Apache Pulsar). StreamPipes itself focuses on enabling self-service IoT data 
analytics for non-technical users.

The whole StreamPipes code is currently on Github. To get a rough estimate of 
the project size:

* streampipes: Backend and core modules, ~3300 commits

* streampipes-ui: User Interface, ~1300 commits

* streampipes-pipeline-elements: ~100 Pipeline Elements (data 
processors/algorithms and sinks), ~500 Commits

* streampipes-connect-adapters: ~20 Adapters to connect data, ~100 commits To 
achieve our goal to further extend the code base with new features, new 
connectors and new algorithms and to grow both the user and developer 
community, we believe that a community-driven development process is the best 
way to further develop StreamPipes. Finally, after having talked to committers 
from various Apache IoT-related projects and participation in spontaneous 
hacking sessions and being impressed by the collaboration among individual 
projects, we decided that (from our point of view) the ASF is the ideal place 
to be the future home of StreamPipes.



=== Initial Goals ===

* Move the existing codebase to Apache

* Fully align with Apache development- and release processes

* Perform name search and do a thorough review of existing licenses

* First Apache release



=== Current Status ===

** Meritocracy **

We are absolutely committed to strengthen StreamPipes as a real 
community-driven open source project. The existing committer base is highly 
motivated to foster the open source way in the industrial IoT sector and, 
together with existing Apache communities focused on this domain, provide open 
source tooling for Industrial IoT projects in the same way Apache offers in the 
Big Data space, for instance.

The development philosophy behind StreamPipes has always followed the 
principles of meritocracy - although most committers are still active in the 
project, we managed to onboard new, committed developers regularly. 2 people, 
who are today core of the developer team, have joined during the past year. 
Therefore, we would aim to continuously expand the PMC and committer base based 
on merit.



** Community **

Since being open-sourced in 2018, the public interest in StreamPipes has 
steadily grown. Several companies, mainly from the manufacturing domain, have 
tested StreamPipes in form of proof-of-concept projects. First companies have 
started to use StreamPipes in production. This was due to a high number of 
events from meetups, research conferences, demo sessions up to hackathons we 
participated or organized during the past two years. After having generated a 
general interest in StreamPipes, our next focus will be to find more committers 
to diversify the contributor base.



** Core Developers **

The core developers of the system are Dominik Riemer, Philipp Zehnder, Patrick 
Wiener and Johannes Tex. All core developers are initial committers in the 
current proposal. Some former students who recently started to work at 
companies and who have also worked on the project with great commitment, will 
be asked to further contribute to the project.



** Alignment **

StreamPipes has dependencies to a lot of existing Apache projects - this is one 
reason why we think that the ASF is the best future home for StreamPipes. The 
messaging layer is based on Apache Kafka (and also Apache Pulsar as a future 
option), and runtime wrappers exist for Apache Flink, Apache Spark and Apache 
Kafka Streams. StreamPipes Connect already includes adapters for several Apache 
projects. Most importantly, we integrate (and plan to deepen the integration) 
with IIoT-focused projects such as Apache PLC4X. Also, several data sinks exist 
to send messages to tools from other Apache projects (e.g., Apache Kafka, 
Apache Pulsar, and Apache IoTDB). Together with these tools (and also after 
having talked to the core developers after this year's ApacheCon) we are 
absolutely convinced that a tight integration between these tools will 
strengthen the open source IoT ecosystem.



=== Known Risks ===

** Orphaned Products **

We don't expect the risk of an orphaned product. The initial committers have 
worked on the project for years and are absolutely committed to making this 
open source tool a great success. All initial committers are committed to work 
on StreamPipes in their free time.



** Inexperience with Open Source **

All initial committers have years of expertise related to open source 
development and understand what open source means. However, none of the initial 
committers are currently committers to Apache projects, although some have 
already contributed to some projects. From a variety of events and from 
intensively studying Apache mailing lists, we are sure that the Apache Way is 
the way we'd like the project to move into the future. We expect to benefit 
from the experiences from the ASF in building successful open source projects.



** Length of Incubation **

We are aware that incubation is a process that is focused on building the 
community, learning the Apache Way and other important things such as learning 
the release process and handling licensing and trademark issues. We are also 
aware that, although there is a steadily increasing interest in StreamPipes, a 
major challenge we would need (and are willing) to work on during the 
incubation phase is widening the committer base. We look forward to that as a 
large developer base is exactly what we are striving for.



** Homogeneous Developers **

Most current developers work for the same institution (FZI). The motivation of 
all developers goes beyond their commitment to work and all current committers 
work on StreamPipes in their free time. Recently, we have received first pull 
requests from external contributors and a growing interest from users and 
companies outside of FZI. First manufacturing companies have already evaluated 
and adopted StreamPipes. To attract external developers, we've created an 
extensive documentation, have a Slack channel to quickly answer questions, and 
provide help via mail. Therefore, we believe that making the developer 
community more heterogeneous is not only mandatory, but something that can be 
achieved during the next months.



** Reliance on salaried developers **

Currently, StreamPipes receives support from salaried developers, mainly 
research scientists from FZI. However, all core developers substantially work 
on StreamPipes in their spare time. As this has been the case from the 
beginning in early 2014, it can be expected that a substantial portion of 
volunteers will continue to be working on the project and we aim at 
strengthening the base of non-paid committers and paid committers of other 
companies. At the same time, funding of the initial StreamPipes team is secured 
by public funding for the next few years, making sure that there will be also 
enough commitment from developers during their work time.



** Relationships with other Apache products ** StreamPipes is often compared to 
tools such as Node-Red and Apache Nifi. This is mainly based on a similar UI 
concept (dataflow approach). Despite some technological differences (e.g., the 
microservice analytics approach vs. single-host runtime of Node-Red, the 
wrapper architecture and the underlying semantics-based model), we believe the 
target audience differs. We aim to collaborate with the Apache Nifi community 
in terms of exchanging best practices and also integrating both projects (e.g., 
by building connectors).

As mentioned above, quite a few adapters and data sinks are already available 
that link to existing Apache projects.



** An excessive fascination with the Apache Brand ** Although we recognize the 
Apache brand as the most visible brand in the open source domain, the primary 
goal of this proposal is not to create publicity, but to widen the developer 
base. We believe that successful projects have broad and diverse communities. 
We expect that an Apache project, with a clear and proven way to develop open 
source software, helps in finding new committers. As the core development team 
has already worked on StreamPipes for the past few years and is fully committed 
to the software and its benefit for the industrial IoT domain, we would also 
continue development without being an Apache project.



=== Documentation ===

Currently, we host a website at https://www.streampipes.org More technical info 
(user + developer guide) can be found in the documentation: 
https://docs.streampipes.org, where users can find tutorials and manuals on how 
to extend StreamPipes using the SDK.



=== Initial Source ===

Currently, the following Github repositories exist, all licensed under the 
Apache Software License 2.0:

* streampipes (https://www.github.com /streampipes/streampipes, the backend & 
pipeline management module)

* streampipes-ui (https://www.github.com/streampipes/streampipes-ui, the UI 
module)

* streampipes-pipeline-elements 
(https://www.github.com/streampipes/streampipes-pipeline-elements, library of 
data processors and sinks)

* streampipes-connect-adapters 
(https://www.github.com/streampipes/streampipes-connect-adapters, StreamPipes 
connect adapters)

* streampipes-docs (https://www.github.com/streampipes/streampipes-docs, the 
abovementioned documentation)



=== Source and intellectual property submission plan === All initial committers 
will sign a ICLA with the ASF. FZI, as the organizational body that has 
employed the main contributors of StreamPipes, will sign a CCLA and donate the 
codebase to the ASF (both subject to formal approval). All major contributors 
are still active in the project.



=== External Dependencies ===

We did an initial review of all dependencies used in the various projects. No 
critical libraries that depend on category X licenses were found, some minor 
issues have already been resolved (e.g., removing dependencies to org.json 
libraries). Most external dependencies used by the Java-based (backend, 
pipeline-elements and connect) modules are licensed under the Apache License 
2.0, whereas some licenses are Cat B (e.g., CDDL). Most external dependencies 
the UI requires on are licensed under the MIT license.

Once we are moving to the Incubator, we would do a complete check of all 
transitive dependencies. We don't expect any surprises here.



=== Cryptography ===

(not applicable)



=== Required Resources ===

** Mailing Lists **

We plan to use the following mailing lists:

* 
us...@streampipes.incubator.apache.org<mailto:us...@streampipes.incubator.apache.org>

* 
d...@streampipes.incubator.apache.org<mailto:d...@streampipes.incubator.apache.org>

* 
priv...@streampipes.incubator.apache.org<mailto:priv...@streampipes.incubator.apache.org>

* 
comm...@streampipes.incubator.apache.org<mailto:comm...@streampipes.incubator.apache.org>

As StreamPipes is targeted to a non-technical audience, we see a dedicated user 
mailing list as an important requirement to help users.



** Subversion directory **

(not applicable)



** Git repositories **

We would like to use Git for source code management and enable Github mirroring 
functionality.



As we plan to merge some of the repos described above to simplify the release 
process we ask to create the following source repositories:

* streampipes (containing backend + UI)

* streampipes-extensions (containing modules that can be dynamically installed 
at runtime: pipeline elements and connect adapters)

* streampipes-website (containing docs + website)



** Issue tracking **

JIRA ID: StreamPipes



=== Initial Committers ===

List of initial committers in alphabetical order:

Christofer Dutz (christofer.dutz at c-ware dot de) Dominik Riemer (dominik dot 
riemer at gmail dot com) Johannes Tex (tex at fzi dot de) Patrick Wiener 
(wiener at fzi dot de) Philipp Zehnder (zehnder at fzi dot de)



=== Sponsors ===

** Champion **

* Christofer Dutz (christofer.dutz at c-ware dot de)



** Mentors **

* Christofer Dutz (christofer.dutz at c-ware dot de)

* Julian Feinauer (Jfeinauer at apache dot org)

* Justin Mclean (justin at classsoftware dot com)



** Sponsoring Entity **

The Apache Incubator

Write Access to Incubator Wiki

Reply via email to