Thanks, much appreciated ! Regards JB
On 01/31/2018 09:50 AM, Byung-Gon Chun wrote: > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi, >> >> Coral is a good name ! >> > > Thanks! > > >> >> Does the code belong to Seoul National University ? In that case, in >> addition of >> your ICLA, we would need a SGA (it's not blocker for the project >> bootstrapping >> or code donation, but we, at least, will need it later for graduation). On >> the >> other hand, if the committers are all part on the university, you can also >> sign >> a CCLA. >> > > I will figure this out. > > >> >> Happy to be mentor on the project if you want me ! ;) >> >> > Thanks! I will add you to the mentor list. > > -Gon > > >> Thanks, >> Regards >> JB >> >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: >>> Thanks for the comments, JB! >>> My replies are inlined below. >>> >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi, >>>> >>>> sorry to be a little bit late on this. >>>> >>>> It's a very interesting proposal. It sounds pretty close to the >> portability >>>> layer we want to add in Apache Beam. I would love to see interaction >>>> between the >>>> two communities. >>>> >>>> I have two minor questions: >>>> >>>> 1. about the name: Onyx sounds very generic and the name is used in >> other >>>> technologies. Maybe another unique name would be more accurate. >>>> >>> >>> We proposed Coral instead. How does this sound? >>> >>> >>>> 2. the Onyx code is on github right now, under the Apache 2.0 license. >>>> Does this >>>> code has any affiliation with companies ? Meaning that we would need a >> SGA >>>> for >>>> the code donation. >>>> >>>> It does not. The developers are affiliated with Seoul National >> University. >>> In this case, do we still need a SGA? >>> >>> >>>> If you need any help for the incubation, I would be more than happy to >>>> help ! >>>> >>>> >>> Thanks for the offer. Would you be interested in being a mentor of the >>> project? >>> >>> Thanks. >>> -Gon >>> >>> >>> >>>> Regards >>>> JB >>>> >>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: >>>>> Dear Apache Incubator Community, >>>>> >>>>> Please accept the following proposal for presentation and discussion: >>>>> https://wiki.apache.org/incubator/OnyxProposal >>>>> >>>>> Onyx is a data processing system that aims to flexibly control the >>>> runtime >>>>> behaviors of a job to adapt to varying deployment characteristics >> (e.g., >>>>> harnessing transient resources in datacenters, cross-datacenter >>>> deployment, >>>>> changing runtime based on job characteristics, etc.). Onyx provides >> ways >>>> to >>>>> extend the system’s capabilities and incorporate the extensions to the >>>>> flexible job execution. >>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into >> an >>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >>>>> based on a deployment policy. >>>>> >>>>> I've attached the proposal below. >>>>> >>>>> Best regards, >>>>> Byung-Gon Chun >>>>> >>>>> = OnyxProposal = >>>>> >>>>> == Abstract == >>>>> Onyx is a data processing system for flexible employment with >>>>> different execution scenarios for various deployment characteristics >>>>> on clusters. >>>>> >>>>> == Proposal == >>>>> Today, there is a wide variety of data processing systems with >>>>> different designs for better performance and datacenter efficiency. >>>>> They include processing data on specific resource environments and >>>>> running jobs with specific attributes. Although each system >>>>> successfully solves the problems it targets, most systems are designed >>>>> in the way that runtime behaviors are built tightly inside the system >>>>> core to hide the complexity of distributed computing. This makes it >>>>> hard for a single system to support different deployment >>>>> characteristics with different runtime behaviors without substantial >>>>> effort. >>>>> >>>>> Onyx is a data processing system that aims to flexibly control the >>>>> runtime behaviors of a job to adapt to varying deployment >>>>> characteristics. Moreover, it provides a means of extending the >>>>> system’s capabilities and incorporating the extensions to the flexible >>>>> job execution. >>>>> >>>>> In order to be able to easily modify runtime behaviors to adapt to >>>>> varying deployment characteristics, Onyx exposes runtime behaviors to >>>>> be flexibly configured and modified at both compile-time and runtime >>>>> through a set of high-level graph pass interfaces. >>>>> >>>>> We hope to contribute to the big data processing community by enabling >>>>> more flexibility and extensibility in job executions. Furthermore, we >>>>> can benefit more together as a community when we work together as a >>>>> community to mature the system with more use cases and understanding >>>>> of diverse deployment characteristics. The Apache Software Foundation >>>>> is the perfect place to achieve these aspirations. >>>>> >>>>> == Background == >>>>> Many data processing systems have distinctive runtime behaviors >>>>> optimized and configured for specific deployment characteristics like >>>>> different resource environments and for handling special job >>>>> attributes. >>>>> >>>>> For example, much research have been conducted to overcome the >>>>> challenge of running data processing jobs on cheap, unreliable >>>>> transient resources. Likewise, techniques for disaggregating different >>>>> types of resources, like memory, CPU and GPU, are being actively >>>>> developed to use datacenter resources more efficiently. Many >>>>> researchers are also working to run data processing jobs in even more >>>>> diverse environments, such as across distant datacenters. Similarly, >>>>> for special job attributes, many works take different approaches, such >>>>> as runtime optimization, to solve problems like data skew, and to >>>>> optimize systems for data processing jobs with small-scale input data. >>>>> >>>>> Although each of the systems performs well with the jobs and in the >>>>> environments they target, they perform poorly with unconsidered cases, >>>>> and do not consider supporting multiple deployment characteristics on >>>>> a single system in their designs. >>>>> >>>>> For an application writer to optimize an application to perform well >>>>> on a certain system engraved with its underlying behaviors, it >>>>> requires a deep understanding of the system itself, which is an >>>>> overhead that often requires a lot of time and effort. Moreover, for a >>>>> developer to modify such system behaviors, it requires modifications >>>>> of the system core, which requires an even deeper understanding of the >>>>> system itself. >>>>> >>>>> With this background, Onyx is designed to represent all of its jobs as >>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user >>>>> applications from various programming models (ex. Apache Beam) are >>>>> submitted, transformed to an IR DAG, and optimized/customized for the >>>>> deployment characteristics. In the IR DAG optimization phase, the DAG >>>>> is modified through a series of compiler “passes” which reshape or >>>>> annotate the DAG with an expression of the underlying runtime >>>>> behaviors. The IR DAG is then submitted as an execution plan for the >>>>> Onyx runtime. The runtime includes the unmodified parts of data >>>>> processing in the backbone which is transparently integrated with >>>>> configurable components exposed for further extension. >>>>> >>>>> == Rationale == >>>>> Onyx’s vision lies in providing means for flexibly supporting a wide >>>>> variety of job execution scenarios for users while facilitating system >>>>> developers to extend the execution framework with various >>>>> functionalities at the same time. The capabilities of the system can >>>>> be extended as it grows to meet a more variety of execution scenarios. >>>>> We require inputs from users and developers from diverse domains in >>>>> order to make it a more thriving and useful project. The Apache >>>>> Software Foundation provides the best tools and community to support >>>>> this vision. >>>>> >>>>> == Initial Goals == >>>>> Initial goals will be to move the existing codebase to Apache and >>>>> integrate with the Apache development process. We further plan to >>>>> develop our system to meet the needs for more execution scenarios for >>>>> a more variety of deployment characteristics. >>>>> >>>>> == Current Status == >>>>> Onyx codebase is currently hosted in a repository at github.com. The >>>>> current version has been developed by system developers at Seoul >>>>> National University, Viva Republica, Samsung, and LG. >>>>> >>>>> == Meritocracy == >>>>> We plan to strongly support meritocracy. We will discuss the >>>>> requirements in an open forum, and those that continuously contribute >>>>> to Onyx with the passion to strengthen the system will be invited as >>>>> committers. Contributors that enrich Onyx by providing various use >>>>> cases, various implementations of the configurable components >>>>> including ideas for optimization techniques will be especially >>>>> welcome. Committers with a deep understanding of the system’s >>>>> technical aspects as a whole and its philosophy will definitely be >>>>> voted as the PMC. We will monitor community participation so that >>>>> privileges can be extended to those that contribute. >>>>> >>>>> == Community == >>>>> We hope to expand our contribution community by becoming an Apache >>>>> incubator project. The contributions will come from both users and >>>>> system developers interested in flexibility and extensibility of job >>>>> executions that Onyx can support. We expect users to mainly contribute >>>>> to diversify the use cases and deployment characteristics, and >>>>> developers to contribute to implement them. >>>>> >>>>> == Alignment == >>>>> Apache Spark is one of many popular data processing frameworks. The >>>>> system is designed towards optimizing jobs using RDDs in memory and >>>>> many other optimizations built tightly within the framework. In >>>>> contrast to Spark, Onyx aims to provide more flexibility for job >>>>> execution in an easy manner. >>>>> >>>>> Apache Tez enables developers to build complex task DAGs with control >>>>> over the control plane of job execution. In Onyx, a high-level >>>>> programming layer (ex. Apache Beam) is automatically converted to a >>>>> basic IR DAG and can be converted to any IR DAG through a series of >>>>> easy user writable passes, that can both reshape and modify the >>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves >>>>> more parts of the job execution configurable, such as the scheduler >>>>> and the data plane. As opposed to providing a set of properties for >>>>> solid optimization, Onyx’s configurable parts can be easily extended >>>>> and explored by implementing the pre-defined interfaces. For example, >>>>> an arbitrary intermediate data store can be added. >>>>> >>>>> Onyx currently supports Apache Beam programs and we are working on >>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache >>>>> REEF for container management, which allows Onyx to run in Apache YARN >>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and >>>>> collaborate with these other Apache projects for the benefit of all. >>>>> We plan to extend such integrations with more Apache softwares. Apache >>>>> software foundation already hosts many major big-data systems, and we >>>>> expect to help further growth of the big-data community by having Onyx >>>>> within the Apache foundation. >>>>> >>>>> == Known Risks == >>>>> === Orphaned Products === >>>>> The risk of the Onyx project being orphaned is minimal. There is >>>>> already plenty of work that arduously support different deployment >>>>> characteristics, and we propose a general way to implement them with >>>>> flexible and extensible configuration knobs. The domain of data >>>>> processing is already of high interest, and this domain is expected to >>>>> evolve continuously with various other purposes, such as resource >>>>> disaggregation and using transient resources for better datacenter >>>>> resource utilization. >>>>> >>>>> === Inexperience with Open Source === >>>>> The initial committers include PMC members and committers of other >>>>> Apache projects. They have experience with open source projects, >>>>> starting from their incubation to the top-level. They have been >>>>> involved in the open source development process, and are familiar with >>>>> releasing code under an open source license. >>>>> >>>>> === Homogeneous Developers === >>>>> The initial set of committers is from a limited set of organizations, >>>>> but we expect to attract new contributors from diverse organizations >>>>> and will thus grow organically once approved for incubation. Our prior >>>>> experience with other open source projects will help various >>>>> contributors to actively participate in our project. >>>>> >>>>> === Reliance on Salaried Developers === >>>>> Many developers are from Seoul National University. This is not >>>> applicable. >>>>> >>>>> === Relationships with Other Apache Products === >>>>> Onyx positions itself among multiple Apache products. It runs on >>>>> Apache REEF for container management. It also utilizes many useful >>>>> development tools including Apache Maven, Apache Log4J, and multiple >>>>> Apache Commons components. Onyx supports the Apache Beam programming >>>>> model for user applications. We are currently working on supporting >>>>> the Apache Spark programming APIs as well. >>>>> >>>>> === An Excessive Fascination with the Apache Brand === >>>>> We hope to make Onyx a powerful system for data processing, meeting >>>>> various needs for different deployment characteristics, under a more >>>>> variety of environments. We see the limitations of simply putting code >>>>> on GitHub, and we believe the Apache community will help the growth of >>>>> Onyx for the project to become a positively impactful and innovative >>>>> open source software. We believe Onyx is a great fit for the Apache >>>>> Software Foundation due to the collaboration it aims to achieve from >>>>> the big data processing community. >>>>> >>>>> == Documentation == >>>>> The current documentation for Onyx is at >> https://snuspl.github.io/onyx/. >>>>> >>>>> == Initial Source == >>>>> The Onyx codebase is currently hosted at >> https://github.com/snuspl/onyx. >>>>> >>>>> == External Dependencies == >>>>> To the best of our knowledge, all Onyx dependencies are distributed >>>>> under Apache compatible licenses. Upon acceptance to the incubator, we >>>>> would begin a thorough analysis of all transitive dependencies to >>>>> verify this fact and further introduce license checking into the build >>>>> and release process. >>>>> >>>>> == Cryptography == >>>>> Not applicable. >>>>> >>>>> == Required Resources == >>>>> === Mailing Lists === >>>>> We will operate two mailing lists as follows: >>>>> * Onyx PMC discussions: priv...@onyx.incubator.apache.org >>>>> * Onyx developers: d...@onyx.incubator.apache.org >>>>> >>>>> === Git Repositories === >>>>> Upon incubation: https://github.com/apache/incubator-onyx. >>>>> After the incubation, we would like to move the existing repo >>>>> https://github.com/snuspl/onyx to the Apache infrastructure >>>>> >>>>> === Issue Tracking === >>>>> Onyx currently tracks its issues using the Github issue tracker: >>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache >>>>> JIRA. >>>>> >>>>> == Initial Committers == >>>>> * Byung-Gon Chun >>>>> * Jeongyoon Eo >>>>> * Geon-Woo Kim >>>>> * Joo Yeon Kim >>>>> * Gyewon Lee >>>>> * Jung-Gil Lee >>>>> * Sanha Lee >>>>> * Wooyeon Lee >>>>> * Yunseong Lee >>>>> * JangHo Seo >>>>> * Won Wook Song >>>>> * Taegeon Um >>>>> * Youngseok Yang >>>>> >>>>> == Affiliations == >>>>> * SNU (Seoul National University) >>>>> * Byung-Gon Chun >>>>> * Jeongyoon Eo >>>>> * Geon-Woo Kim >>>>> * Gyewon Lee >>>>> * Sanha Lee >>>>> * Wooyeon Lee >>>>> * Yunseong Lee >>>>> * JangHo Seo >>>>> * Won Wook Song >>>>> * Taegeon Um >>>>> * Youngseok Yang >>>>> >>>>> * LG >>>>> * Jung-Gil Lee >>>>> >>>>> * Samsung >>>>> * Joo Yeon Kim >>>>> >>>>> * Viva Republica >>>>> * Geon-Woo Kim >>>>> >>>>> == Sponsors == >>>>> === Champions === >>>>> Byung-Gon Chun >>>>> >>>>> === Mentors === >>>>> * Hyunsik Choi >>>>> * Byung-Gon Chun >>>>> * Markus Weimer >>>>> * Reynold Xin >>>>> >>>>> === Sponsoring Entity === >>>>> The Apache Incubator >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Jean-Baptiste Onofré >>>> jbono...@apache.org >>>> http://blog.nanthrax.net >>>> Talend - http://www.talend.com >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>>> For additional commands, e-mail: general-h...@incubator.apache.org >>>> >>>> >>> >>> >> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > > -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org