+1 (non-binding) ----------------------------------- Xiangdong Huang School of Software, Tsinghua University
Lars George <larsgeo...@apache.org> 于2020年12月12日周六 下午8:19写道: > +1 binding > > On Sat, Dec 12, 2020 at 2:24 AM Sheng Wu <wu.sheng.841...@gmail.com> > wrote: > > > +1 binding > > > > Sheng Wu 吴晟 > > Twitter, wusheng1108 > > > > > > Byung-Gon Chun <bgc...@gmail.com> 于2020年12月12日周六 上午5:59写道: > > > > > +1 (binding) > > > > > > -Gon > > > > > > On Sat, Dec 12, 2020 at 2:35 AM Furkan KAMACI <furkankam...@gmail.com> > > > wrote: > > > > > > > Hi, > > > > > > > > +1 (binding) > > > > > > > > Kind Regards, > > > > Furkan KAMACI > > > > > > > > On 11 Dec 2020 Fri at 20:04 Daniel B. Widdis <wid...@gmail.com> > wrote: > > > > > > > > > +1 (non-binding). I'm interested in getting involved in this > > project! > > > > > > > > > > On Fri, Dec 11, 2020 at 8:33 AM Christofer Dutz < > > > > christofer.d...@c-ware.de > > > > > > > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > following up the [DISCUSS] thread on Wayang ( > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/r5fc03ae014f44c7c31a509a6db4ac07faedb2e1c6245cd917b744826%40%3Cgeneral.incubator.apache.org%3E > > > > > ) > > > > > > I would like to call a VOTE to accept Wayang Aka Rheem into the > > > Apache > > > > > > Incubator. > > > > > > > > > > > > Please cast your vote: > > > > > > > > > > > > [ ] +1, bring Wayang into the Incubator > > > > > > [ ] +0, I don't care either way > > > > > > [ ] -1, do not bring Wayang into the Incubator, because... > > > > > > > > > > > > The vote will open at least for 72 hours and only votes from the > > > > > Incubator > > > > > > PMC are binding, but votes from everyone are welcome. > > > > > > > > > > > > Chris > > > > > > > > > > > > ----- > > > > > > > > > > > > Wayang Proposal ( > > > > > > > > https://cwiki.apache.org/confluence/display/INCUBATOR/WayangProposal > > > ) > > > > > > > > > > > > == Abstract == > > > > > > > > > > > > Wayang is a cross-platform data processing system that aims at > > > > decoupling > > > > > > the business logic of data analytics applications from concrete > > data > > > > > > processing platforms, such as Apache Flink or Apache Spark. > Hence, > > it > > > > > tames > > > > > > the complexity that arises from the "Cambrian explosion" of novel > > > data > > > > > > processing platforms that we currently witness. > > > > > > > > > > > > Note that Wayang project is the Rheem project, but we have > renamed > > > the > > > > > > project because of trademark issues. > > > > > > > > > > > > You can find the project web page at: > > > > https://rheem-ecosystem.github.io/ > > > > > > > > > > > > = Proposal = > > > > > > > > > > > > Wayang is a cross-platform system that provides an abstraction > over > > > > data > > > > > > processing platforms to free users from the burdens of (i) > > performing > > > > > > tedious and costly data migration and integration tasks to run > > their > > > > > > applications, and (ii) choosing the right data processing > platforms > > > for > > > > > > their applications. To achieve this, Wayang: (1) provides an > > > > abstraction > > > > > on > > > > > > top of existing data processing platforms that allows users to > > > specify > > > > > > their data analytics tasks in a form of a DAG of operators; (2) > > comes > > > > > with > > > > > > a cross-platform optimizer for automating the selection of > > > > > > suitable/efficient platforms; and (3) and finally takes care of > > > > executing > > > > > > the optimized plan, including communication across platforms. In > > > > summary, > > > > > > Wayang has the following salient features: > > > > > > > > > > > > - Flexible Data Model - It considers a flexible and simple data > > model > > > > > > based on data quanta. A data quantum is an atomic processing unit > > in > > > > the > > > > > > system, that can represent a large spectrum of data formats, such > > as > > > > data > > > > > > points for a machine learning application, tuples for a database > > > > > > application, or RDF triples. Hence, Wayang is able to express a > > wide > > > > > range > > > > > > of data analytics tasks. > > > > > > - Platform independence - It provides a simple interface > (currently > > > > Java > > > > > > and Scala) that is inspired by established programming models, > such > > > as > > > > > that > > > > > > of Apache Spark and Apache Flink. Users represent their data > > analytic > > > > > tasks > > > > > > as a DAG (Wayang plan), where vertices correspond to Wayang > > operators > > > > and > > > > > > edges represent data flows (data quanta flowing) among these > > > > operators. A > > > > > > Wayang operator defines a particular kind of data transformation > > over > > > > an > > > > > > input data quantum, ranging from basic functionality (e.g., > > > > > > transformations, filters, joins) to complex, extensible tasks > > (e.g., > > > > > > PageRank). > > > > > > - Cross-platform execution - Besides running a data analytic task > > on > > > > any > > > > > > data processing platform, it also comes with an optimizer that > can > > > > decide > > > > > > to execute a single data analytic task using multiple data > > processing > > > > > > platforms. This allows for exploiting the capabilities of > different > > > > data > > > > > > processing platforms to perform complex data analytic tasks more > > > > > > efficiently. > > > > > > Self-tuning UDF-based cost model - Its optimizer uses a cost > model > > > > fully > > > > > > based on UDFs. This not only enables Wayang to learn the cost > > > functions > > > > > of > > > > > > newly added data processing platforms, but also allows developers > > to > > > > tune > > > > > > the optimizer at will. > > > > > > - Extensibility - It treats data processing platforms as plugins > to > > > > allow > > > > > > users (developers) to easily incorporate new data processing > > > platforms > > > > > into > > > > > > the system. This is achieved by exposing the functionalities of > > data > > > > > > processing platforms as operators (execution operators). The same > > > > > approach > > > > > > is followed at the Wayang interface, where users can also extend > > > Wayang > > > > > > capabilities, i.e., the operators, easily. > > > > > > > > > > > > We plan to work on the stability of all these features as well as > > > > > > extending Wayang with more advanced features. Furthermore, Wayang > > > > > currently > > > > > > supports Apache Spark, Standalone Java, GraphChi, relational > > > databases > > > > > (via > > > > > > JDBC). We plan to incorporate more data processing platforms, > such > > as > > > > > > Apache Flink and Apache Hive. > > > > > > > > > > > > === Background === > > > > > > > > > > > > Many organizations and companies collect or produce large variety > > of > > > > data > > > > > > to apply data analytics over them. This is because insights from > > data > > > > > > rapidly allow them to make better decisions. Thus, the pursuit > for > > > > > > efficient and scalable data analytics as well as the > > > > > > one-size-does-not-fit-all philosophy has given rise to a plethora > > of > > > > data > > > > > > processing platforms. Examples of these specialized processing > > > > platforms > > > > > > range from DBMSs to MapReduce-like platforms. > > > > > > > > > > > > However, today's data analytics are moving beyond the limits of a > > > > single > > > > > > data processing platform. More and more applications need to > > perform > > > > > > complex data analytics over several data processing platforms. > For > > > > > example, > > > > > > IBM reported that North York hospital needs to process 50 diverse > > > > > datasets, > > > > > > which are on a dozen different internal systems, (ii) oil & gas > > > > companies > > > > > > stated they need to process large amounts of data they produce > > > > everyday, > > > > > > e.g., a single oil company can produce more than 1.5TB of diverse > > > > > > (structured and unstructured) data per day, (iii) Fortune > magazine > > > > stated > > > > > > that airlines need to analyze large datasets, which are produced > by > > > > > > different departments, are of different data formats, and reside > on > > > > > > multiple data sources, to produce global reports for decision > > makers, > > > > and > > > > > > (iv) Hewlett Packard has claimed that, according to its customer > > > > > portfolio, > > > > > > business intelligence typically require a single analytics > pipeline > > > > using > > > > > > different processing platforms at different parts of the > pipeline. > > > > These > > > > > > are just a few examples of emerging applications that require a > > > > diversity > > > > > > of data processing platforms. > > > > > > > > > > > > Today, developers have to deal with this myriad of data > processing > > > > > > platforms. That is, they have to choose the right data processing > > > > > platform > > > > > > for their applications (or data analytic tasks) and to > familiarize > > > with > > > > > the > > > > > > intricacies of the different platforms to achieve high efficiency > > and > > > > > > scalability. Several systems have also appeared with the goal of > > > > helping > > > > > > users to easily glue several platforms together, such as Apache > > > Drill, > > > > > > PrestoDB, and Luigi. Nevertheless, all these systems still > require > > > > quite > > > > > > good expertise from users to decide which data processing > platforms > > > to > > > > > use > > > > > > for the data analytic task at hand. In consequence, great > > engineering > > > > > > effort is required to unify the data from various sources, to > > combine > > > > the > > > > > > processing capabilities of different platforms, and to maintain > > those > > > > > > applications, so as to unleash the full potential of the data. In > > the > > > > > worst > > > > > > case, such applications are not built in the first place, as it > > seems > > > > too > > > > > > much of a daunting endeavor. > > > > > > > > > > > > === Rationale === > > > > > > > > > > > > It is evident that there is an urgent need to release developers > > from > > > > the > > > > > > burden of knowing all the intricacies of choosing and glueing > > > together > > > > > data > > > > > > processing platforms for supporting their applications (data > > analytic > > > > > > tasks). Developers must focus only on the logics of their > > > applications. > > > > > > Surprisingly, there is no open source system trying to satisfy > this > > > > > urgent > > > > > > need. Wayang aims at filling this gap. It copes with this urgent > > need > > > > by > > > > > > providing both a common interface over data processing platforms > > and > > > an > > > > > > optimizer to execute data analytic tasks on the right data > > processing > > > > > > platform(s) seamlessly. As Apache is the place where most of the > > > > > important > > > > > > big data systems are, we then consider Apache as the right place > > for > > > > > Wayang. > > > > > > > > > > > > === Current Status === > > > > > > > > > > > > The current version of Wayang (v0.5.0) was initially co-developed > > by > > > > > > staff, students, and interns at the Qatar Computing Research > > > Institute > > > > > > (QCRI) and the Hasso-Plattner Institute (HPI). The project was > > > > initiated > > > > > at > > > > > > and sponsored by QCRI in 2015 with the goal of freeing data > > > scientists > > > > > and > > > > > > developers from the intricacies of data processing platforms to > > > support > > > > > > their analytic tasks. The first open source release of Wayang was > > > made > > > > > only > > > > > > one year and a half later, in June 13th of 2016, under the Apache > > > > > Software > > > > > > License 2.0. Since we have made several releases, the latest > > release > > > > was > > > > > > done on January 23th, 2019. > > > > > > > > > > > > ** Meritocracy ** > > > > > > > > > > > > All current Wayang developers are familiar with this development > > > > process > > > > > > at Apache and are currently trying to follow this meritocracy > > process > > > > as > > > > > > much as possible. For example, Wayang already follows a committer > > > > > principle > > > > > > where any pull request is analyzed by at least one Wayang core > > > > developer. > > > > > > This was one of the reasons for choosing Apache for Wayang as we > > all > > > > want > > > > > > to encourage and keep this style of development for Wayang. > > > > > > > > > > > > ** Community ** > > > > > > > > > > > > Wayang started as a pure research project, but it quickly started > > > > > > developing into a community. People from HPI quickly joined our > > > efforts > > > > > > almost from the very beginning to make this project a reality. > > > > Recently, > > > > > > the Berlin Institute of Technology (TU Berlin) and the Pontifical > > > > > Catholic > > > > > > University of Valparaiso (PUCV) in Chile have also joined our > > efforts > > > > for > > > > > > developing Wayang. A company, called Scalytics, has been created > > > around > > > > > > Wayang. Currently, we are intensively seeking to further develop > > both > > > > > > developer and user communities. To keep broadening the community, > > we > > > > plan > > > > > > to also exploit our ongoing academic collaborations with multiple > > > > > > universities in Berlin and companies that we collaborate with. > For > > > > > > instance, Wayang is already being utilized for accessing multiple > > > data > > > > > > sources in the context of a large data analytics project led by > TU > > > > Berlin > > > > > > and Huawei. We also believe that Wayang's extensible architecture > > > > (i.e., > > > > > > adding new operators and platforms) will further encourage > > community > > > > > > participation. During incubation we plan to have Wayang adopted > by > > at > > > > > least > > > > > > one company and will explicitly seek more industrial > participation. > > > > > > > > > > > > ** Core Developers ** > > > > > > > > > > > > The initial developers of the project are diverse, they are from > > four > > > > > > different institutions (TU Berlin, Scalytics, PUCV, and HBKU). We > > > will > > > > > work > > > > > > aggressively to grow the community during the incubation by > > > recruiting > > > > > more > > > > > > developers from other institutions. > > > > > > > > > > > > ** Alignment ** > > > > > > > > > > > > We believe Apache is the most natural home for taking Wayang to > the > > > > next > > > > > > level. Apache is currently hosting the most important big data > > > systems. > > > > > > Hadoop, Spark, Flink, HBase, Hive, Tez, Reef, Storm, Drill, and > > > Ignite > > > > > are > > > > > > just some examples of these technologies. Wayang fills a > > significant > > > > gap > > > > > - > > > > > > it provides a common abstraction for all these platforms and > > decides > > > on > > > > > > which platforms to run a single data analytic task - that exist > in > > > the > > > > > big > > > > > > data open source world. Wayang is now being developed following > the > > > > > > Apache-style development model. Also, it is well-aligned with the > > > > Apache > > > > > > principle of building a community to impact the big data > community. > > > > > > > > > > > > === Known Risks === > > > > > > > > > > > > ** Orphaned Products ** > > > > > > > > > > > > Currently, Wayang is the core technology behind Scalytics inc.. > As > > a > > > > > > result, a team of two engineers are working on a full time basis > on > > > > this > > > > > > project. Recently, three more developers have joined our efforts > in > > > > > > building Wayang. Thus, the risk of Wayang becoming orphaned is > > > > relatively > > > > > > very low. Still, people outside Scalytics (from TU Berlin and > HBKU) > > > > have > > > > > > also joined the project, which makes the risk of abandoning the > > > project > > > > > > even lower. The PUCV in Chile is also beginning to contribute to > > the > > > > code > > > > > > base and to develop a declarative query language on top of > Wayang. > > > The > > > > > > project is constantly being monitored by email and frequent Skype > > > > > meetings > > > > > > as well as by weekly meetings with Scalytics people. > Additionally, > > at > > > > the > > > > > > end of each year, we meet to discuss the status of the project as > > > well > > > > as > > > > > > to plan the most important aspects we should work on during the > > year > > > > > after. > > > > > > > > > > > > ** Inexperience with Open Source ** > > > > > > > > > > > > Wayang quickly started being developed in open source under the > > > Apache > > > > > > Software License 2.0. The source code is available on Github. > Also > > > few > > > > of > > > > > > the initial committers have contributed to other open source > > > projects: > > > > > > Hadoop and Flume > > > > > > > > > > > > ** Homogeneous Developers ** > > > > > > > > > > > > The initial committers are already geographically distributed > among > > > > > Chile, > > > > > > Germany, and Qatar. During incubation, one of our main goals is > to > > > > > increase > > > > > > the heterogeneity of the current community and we will work hard > to > > > > > achieve > > > > > > it. > > > > > > > > > > > > ** Reliance on salaried developers ** > > > > > > > > > > > > Wayang is already being developed by a mix of full time and > > volunteer > > > > > > time. Only 2 of the initial committers are working full time on > > this > > > > > > project (Scalytics). So, we are confident that the project will > not > > > > > > decrease its development pace. Furthermore, we are committed to > > > recruit > > > > > > additional committers to significantly increase the development > > pace > > > of > > > > > the > > > > > > project. > > > > > > > > > > > > ** Relationships with other Apache products ** > > > > > > > > > > > > Wayang is somehow related to Apache Spark as its developing > > interface > > > > is > > > > > > inspired from Spark. In contrast to Spark, Wayang is not a data > > > > > processing > > > > > > platform, but a mediator between user applications and data > > > processing > > > > > > platforms. In this sense, Wayang is similar to the Apache Drill > > > > project, > > > > > > and Apache Beam. However, Wayang significantly differs from > Apache > > > > Drill > > > > > in > > > > > > two main aspects. First, Apache Drill provides only a common > > > interface > > > > to > > > > > > query multiple data storages and hence users have to specify in > > their > > > > > query > > > > > > the data to fetch. Then, Apache Drill translates the query to the > > > > > > processing platforms where the data is stored, e.g. into mongoDB > > > query > > > > > > representation. In contrast, in Wayang, users only specify the > data > > > > path > > > > > > and Wayang decides which are the best (performance-wise) data > > > > processing > > > > > > platforms to use to process such data. Second, the query > interface > > in > > > > > > Apache Drill is SQL. Wayang uses an interface based on operators > > > > forming > > > > > > DAGs. In this latter point, we are currently developing a > > > PIGLatin-like > > > > > > query language for Wayang. In addition, in contrast to Apache > Beam, > > > > > Wayang > > > > > > not only allows users to use multiple data processing platforms > at > > > the > > > > > same > > > > > > time, but also it provides an optimizer to choose the most > > efficient > > > > > > platform for the task at hand. In Apache Beam, users have to > > specify > > > an > > > > > > appropriate runner (platform). > > > > > > Given these similarities with the two Apache projects mentioned > > > above, > > > > we > > > > > > are looking forward to collaborating with those communities. > Still, > > > we > > > > > are > > > > > > open and would also love to collaborate with other Apache > > communities > > > > as > > > > > > well. > > > > > > ** An excessive fascination with the Apache Brand ** > > > > > > > > > > > > Wayang solves a real problem that currently users and developers > > have > > > > to > > > > > > deal with at a high cost: monetary cost, high design and > > development > > > > > > efforts, and very time consuming. Therefore, we believe that > Wayang > > > can > > > > > be > > > > > > successful in building a large community around it. We are > > convinced > > > > that > > > > > > the Apache brand and community process will significantly help us > > in > > > > > > building such a community and to establish the project in the > > > > long-term. > > > > > We > > > > > > simply believe that ASF is the right home for Wayang to achieve > > this. > > > > > > > > > > > > === Documentation === > > > > > > > > > > > > Further details, documentation, and publications related to > Wayang > > > can > > > > be > > > > > > found at https://docs.rheem.io/rheem/ > > > > > > > > > > > > === Initial Source === > > > > > > > > > > > > The current source code of Wayang resides in Github: > > > > > > https://github.com/rheem-ecosystem/rheem > > > > > > > > > > > > === External Dependencies === > > > > > > > > > > > > Wayang depends on the following Apache projects: > > > > > > > > > > > > * Maven > > > > > > * HDFS > > > > > > * Hadoop > > > > > > * Spark > > > > > > > > > > > > Wayang depends on the following other open source projects > > organized > > > by > > > > > > license: > > > > > > > > > > > > org.json.json: Json (http://json.org/license.html) > > > > > > SnakeYAML: Apache 2.0 > > > > > > Java Unified Expression Language API (Juel): Apache 2.0 > > > > > > ProfileDB Instrumentation: Apache 2.0 > > > > > > Gson: Apache 2.0 > > > > > > Hadoop: Apache 2.0 > > > > > > Scala: Apache 2.0 > > > > > > Antlr 4: BSD > > > > > > Jackson: Apache 2.0 > > > > > > Junit 5: EPL 2.0 > > > > > > Mockito: MIT > > > > > > Assertj: Apache 2.0 > > > > > > logback-classic: EPL 1.0 LGPL 2.1 > > > > > > slf4j: MIT > > > > > > GNU Trove: LGPL 2.1 > > > > > > graphchi: Apache 2.0 > > > > > > SQLite JDBC: Apache 2.0 > > > > > > PostgreSQL: BSD 2-clause > > > > > > jcommander: Apache 2.0 > > > > > > Koloboke Collections API: Apache 2.0 > > > > > > Snappy Java: Apache 2.0 > > > > > > Apache Spark: Apache 2.0 > > > > > > HyperSQL Database: BSD Modified ( > > > > http://hsqldb.org/web/hsqlLicense.html) > > > > > > Apache Giraph: Apache 2.0 > > > > > > Apache Flink: Apache 2.0 > > > > > > Apache Commons IO: Apache 2.0 > > > > > > Apache Commons Lang: Apache 2.0 > > > > > > Apache Maven: Apache 2.0 > > > > > > > > > > > > === Cryptography === > > > > > > > > > > > > (not applicable) > > > > > > > > > > > > === Required Resources === > > > > > > > > > > > > ** Mailing Lists ** > > > > > > > > > > > > * mailto:priv...@wayang.incubator.apache.org > > > > > > * mailto:d...@wayang.incubator.apache.org > > > > > > * mailto:comm...@wayang.incubator.apache.org > > > > > > > > > > > > ** Git repositories ** > > > > > > > > > > > > git://git.apache.org/repos/asf/incubator/wayang > > > > > > > > > > > > ** Issue tracking ** > > > > > > > > > > > > https://issues.apache.org/jira/browse/RHEEM > > > > > > > > > > > > === Initial Committers === > > > > > > > > > > > > The following list gives the planned initial committers (in > > > > alphabetical > > > > > > order): > > > > > > > > > > > > * Bertty Contreras-Rojas <bertty@http://scalytics.io> > > > > > > * Rodrigo Pardo-Meza <rodrigo@http://scalytics.io> > > > > > > * Alexander Alten-Lorenz <alo@http://scalytics.io> > > > > > > * Zoi Kaoudi <zoi.kaoudi@http://tu-berlin.de> > > > > > > * Haralampos Gavriilidis <gavriilidis@http://tu-berlin.de> > > > > > > * Jorge-Arnulfo Quiane-Ruiz <jorge.quiane@http://tu-berlin.de> > > > > > > * Anis Troudi <atroudi@http://hbku.edu.qa> > > > > > > * Wenceslao Palma-Muñoz <wenceslao.palma@http://pucv.cl> > > > > > > > > > > > > ** Affiliations ** > > > > > > > > > > > > * Scalytics Inc. > > > > > > ** Bertty Contreras-Rojas > > > > > > ** Rodrigo Pardo-Meza > > > > > > ** Alexander Alten-Lorenz > > > > > > * Berlin Institute of Technology (TU Berlin) > > > > > > ** Zoi Kaoudi > > > > > > ** Haralampos Gavriilidis > > > > > > ** Jorge-Arnulfo Quiane-Ruiz > > > > > > * Hamad Bin Khalifa University (HBKU) > > > > > > ** Anis Troudi > > > > > > * Pontifical Catholic University of Valparaiso, Chile (PUCV) > > > > > > ** Wenceslao Palma-Muñoz > > > > > > > > > > > > === Sponsors === > > > > > > > > > > > > ** Champion ** > > > > > > > > > > > > * Christofer Dutz (christofer.dutz at c-ware dot de) > > > > > > > > > > > > ** Mentors ** > > > > > > > > > > > > . (cdutz) Christofer Dutz > > > > > > . (larsgeorge) Lars George > > > > > > . (berndf) Fondermann > > > > > > . (jbonofre) Jean-Baptiste Onofré > > > > > > > > > > > > ** Sponsoring Entity ** > > > > > > > > > > > > The Apache Incubator > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > > > > > For additional commands, e-mail: > general-h...@incubator.apache.org > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Dan Widdis > > > > > > > > > > > > > > > > > > -- > > > Byung-Gon Chun > > > > > >