Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thanks for the question, Liang. The performance of Coral (previously named as Onyx) depends on deployment scenarios. We haven't done extensive experiments with Coral+Spark yet. However, in several deployment scenarios we've been looking at, Coral+Beam outperforms direct Spark even though the Coral runtime still misses a few optimizations the Spark runtime has. If you're interested in more details, please sign up coral@dev once Coral's in apache incubation. Thanks. -Gon On Fri, Feb 2, 2018 at 12:36 AM, Liang Chen wrote: > Hi > > Looks a nice project, very interested in checking more detail. > > How about the performance? Onyx+spark in comparison to directly using > spark, > whether reduce performance ,or not ? > > Regards > Liang > > > Byung-Gon Chun wrote > > Dear Apache Incubator Community, > > > > Please accept the following proposal for presentation and discussion: > > https://wiki.apache.org/incubator/OnyxProposal > > > > Onyx is a data processing system that aims to flexibly control the > runtime > > behaviors of a job to adapt to varying deployment characteristics (e.g., > > harnessing transient resources in datacenters, cross-datacenter > > deployment, > > changing runtime based on job characteristics, etc.). Onyx provides ways > > to > > extend the system’s capabilities and incorporate the extensions to the > > flexible job execution. > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > > based on a deployment policy. > > > > I've attached the proposal below. > > > > Best regards, > > Byung-Gon Chun > > > > = OnyxProposal = > > > > == Abstract == > > Onyx is a data processing system for flexible employment with > > different execution scenarios for various deployment characteristics > > on clusters. > > > > == Proposal == > > Today, there is a wide variety of data processing systems with > > different designs for better performance and datacenter efficiency. > > They include processing data on specific resource environments and > > running jobs with specific attributes. Although each system > > successfully solves the problems it targets, most systems are designed > > in the way that runtime behaviors are built tightly inside the system > > core to hide the complexity of distributed computing. This makes it > > hard for a single system to support different deployment > > characteristics with different runtime behaviors without substantial > > effort. > > > > Onyx is a data processing system that aims to flexibly control the > > runtime behaviors of a job to adapt to varying deployment > > characteristics. Moreover, it provides a means of extending the > > system’s capabilities and incorporating the extensions to the flexible > > job execution. > > > > In order to be able to easily modify runtime behaviors to adapt to > > varying deployment characteristics, Onyx exposes runtime behaviors to > > be flexibly configured and modified at both compile-time and runtime > > through a set of high-level graph pass interfaces. > > > > We hope to contribute to the big data processing community by enabling > > more flexibility and extensibility in job executions. Furthermore, we > > can benefit more together as a community when we work together as a > > community to mature the system with more use cases and understanding > > of diverse deployment characteristics. The Apache Software Foundation > > is the perfect place to achieve these aspirations. > > > > == Background == > > Many data processing systems have distinctive runtime behaviors > > optimized and configured for specific deployment characteristics like > > different resource environments and for handling special job > > attributes. > > > > For example, much research have been conducted to overcome the > > challenge of running data processing jobs on cheap, unreliable > > transient resources. Likewise, techniques for disaggregating different > > types of resources, like memory, CPU and GPU, are being actively > > developed to use datacenter resources more efficiently. Many > > researchers are also working to run data processing jobs in even more > > diverse environments, such as across distant datacenters. Similarly, > > for special job attributes, many works take different approaches, such > > as runtime optimization, to solve problems like data skew, and to > > optimize systems for data processing jobs with small-scale input data. > > > > Although each of the systems performs well with the jobs and in the > > environments they target, they perform poorly with unconsidered cases, > > and do not consider supporting multiple deployment characteristics on > > a single system in their designs. > > > > For an application writer to optimize an application to perform well > > on a certain system engraved with its underlying behaviors, it > > requires a deep understanding of the system itself, which is an > > overhead that often requires a lot of time and e
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Hi Looks a nice project, very interested in checking more detail. How about the performance? Onyx+spark in comparison to directly using spark, whether reduce performance ,or not ? Regards Liang Byung-Gon Chun wrote > Dear Apache Incubator Community, > > Please accept the following proposal for presentation and discussion: > https://wiki.apache.org/incubator/OnyxProposal > > Onyx is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics (e.g., > harnessing transient resources in datacenters, cross-datacenter > deployment, > changing runtime based on job characteristics, etc.). Onyx provides ways > to > extend the system’s capabilities and incorporate the extensions to the > flexible job execution. > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > based on a deployment policy. > > I've attached the proposal below. > > Best regards, > Byung-Gon Chun > > = OnyxProposal = > > == Abstract == > Onyx is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. > > Onyx is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system’s capabilities and incorporating the extensions to the flexible > job execution. > > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Onyx exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. > > == Background == > Many data processing systems have distinctive runtime behaviors > optimized and configured for specific deployment characteristics like > different resource environments and for handling special job > attributes. > > For example, much research have been conducted to overcome the > challenge of running data processing jobs on cheap, unreliable > transient resources. Likewise, techniques for disaggregating different > types of resources, like memory, CPU and GPU, are being actively > developed to use datacenter resources more efficiently. Many > researchers are also working to run data processing jobs in even more > diverse environments, such as across distant datacenters. Similarly, > for special job attributes, many works take different approaches, such > as runtime optimization, to solve problems like data skew, and to > optimize systems for data processing jobs with small-scale input data. > > Although each of the systems performs well with the jobs and in the > environments they target, they perform poorly with unconsidered cases, > and do not consider supporting multiple deployment characteristics on > a single system in their designs. > > For an application writer to optimize an application to perform well > on a certain system engraved with its underlying behaviors, it > requires a deep understanding of the system itself, which is an > overhead that often requires a lot of time and effort. Moreover, for a > developer to modify such system behaviors, it requires modifications > of the system core, which requires an even deeper understanding of the > system itself. > > With this background, Onyx is designed to represent all of its jobs as > an Intermediate Representation (IR) DAG. In the Onyx compiler, user > applications from various programming models (ex. Apache Beam) are > submitted, transformed to an IR DAG, and optimized/customized for the > deployment characteristics. In the IR DAG optimization phase, the DAG > is modified through a series of compiler “passes” which reshape or > annotate the DAG with an expression of the underlying runtime > behaviors. The IR DAG is then s
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thank you all for the feedback. We changed the project name to Coral. You can find the proposal at https://wiki.apache.org/incubator/CoralProposal . I will soon send out a voting email. Thanks. -Gon On Wed, Jan 31, 2018 at 11:54 PM, Jean-Baptiste Onofré wrote: > Thanks, much appreciated ! > > Regards > JB > > On 01/31/2018 09:50 AM, Byung-Gon Chun wrote: > > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré > > wrote: > > > >> Hi, > >> > >> Coral is a good name ! > >> > > > > Thanks! > > > > > >> > >> Does the code belong to Seoul National University ? In that case, in > >> addition of > >> your ICLA, we would need a SGA (it's not blocker for the project > >> bootstrapping > >> or code donation, but we, at least, will need it later for graduation). > On > >> the > >> other hand, if the committers are all part on the university, you can > also > >> sign > >> a CCLA. > >> > > > > I will figure this out. > > > > > >> > >> Happy to be mentor on the project if you want me ! ;) > >> > >> > > Thanks! I will add you to the mentor list. > > > > -Gon > > > > > >> Thanks, > >> Regards > >> JB > >> > >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: > >>> Thanks for the comments, JB! > >>> My replies are inlined below. > >>> > >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré > > >>> wrote: > >>> > Hi, > > sorry to be a little bit late on this. > > It's a very interesting proposal. It sounds pretty close to the > >> portability > layer we want to add in Apache Beam. I would love to see interaction > between the > two communities. > > I have two minor questions: > > 1. about the name: Onyx sounds very generic and the name is used in > >> other > technologies. Maybe another unique name would be more accurate. > > >>> > >>> We proposed Coral instead. How does this sound? > >>> > >>> > 2. the Onyx code is on github right now, under the Apache 2.0 license. > Does this > code has any affiliation with companies ? Meaning that we would need a > >> SGA > for > the code donation. > > It does not. The developers are affiliated with Seoul National > >> University. > >>> In this case, do we still need a SGA? > >>> > >>> > If you need any help for the incubation, I would be more than happy to > help ! > > > >>> Thanks for the offer. Would you be interested in being a mentor of the > >>> project? > >>> > >>> Thanks. > >>> -Gon > >>> > >>> > >>> > Regards > JB > > On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > > Dear Apache Incubator Community, > > > > Please accept the following proposal for presentation and discussion: > > https://wiki.apache.org/incubator/OnyxProposal > > > > Onyx is a data processing system that aims to flexibly control the > runtime > > behaviors of a job to adapt to varying deployment characteristics > >> (e.g., > > harnessing transient resources in datacenters, cross-datacenter > deployment, > > changing runtime based on job characteristics, etc.). Onyx provides > >> ways > to > > extend the system’s capabilities and incorporate the extensions to > the > > flexible job execution. > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > >> an > > Intermediate Representation (IR) DAG, which Onyx optimizes and > deploys > > based on a deployment policy. > > > > I've attached the proposal below. > > > > Best regards, > > Byung-Gon Chun > > > > = OnyxProposal = > > > > == Abstract == > > Onyx is a data processing system for flexible employment with > > different execution scenarios for various deployment characteristics > > on clusters. > > > > == Proposal == > > Today, there is a wide variety of data processing systems with > > different designs for better performance and datacenter efficiency. > > They include processing data on specific resource environments and > > running jobs with specific attributes. Although each system > > successfully solves the problems it targets, most systems are > designed > > in the way that runtime behaviors are built tightly inside the system > > core to hide the complexity of distributed computing. This makes it > > hard for a single system to support different deployment > > characteristics with different runtime behaviors without substantial > > effort. > > > > Onyx is a data processing system that aims to flexibly control the > > runtime behaviors of a job to adapt to varying deployment > > characteristics. Moreover, it provides a means of extending the > > system’s capabilities and incorporating the extensions to the > flexible > > job execution. > > > > In order to be able to easily modify runtime behaviors to adapt to > > varying deployment characteristics, Onyx exposes runtime behaviors to >
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thanks for the information, John! On Wed, Jan 31, 2018 at 9:50 PM, John D. Ament wrote: > Sorry for mid-posting. > > This isn't the list to determine if a project name is suitable. There's a > JIRA project dedicated to that, and if you need a quick answer better to > email trademarks@ to get a more precise answer. > > The question is really going to be, is "Apache Onyx" going to be easily > confused with something else. > > John > > On Sun, Jan 28, 2018 at 4:50 AM Byung-Gon Chun wrote: > > > Thank you for all the information! It looks like Surf doesn't work. > > > > If possible, we'd like to keep Onyx. > > Another name we came up with is Coral. > > > > Thanks! > > -Gon > > > > > > On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom wrote: > > > > > Did we rule out Onyx for sure? Just because some other project might > use > > > it on say github doesn’t necessarily exclude us from having an Apache > > Onyx? > > > > > > FWIW, I agree that surf is too similar in pronunciation to Apache serf. > > :) > > > > > > Cheers, > > > > > > — Leif > > > > > > > On Jan 27, 2018, at 07:31, Dave Fisher > wrote: > > > > > > > > Checking “Serf Software” which sounds the same. > > > > > > > > (1) there is already Apache Serf > > > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This > > > would definitely confuse as it is apparently comparable to ZooKeeper. > > > > > > > > Regards, > > > > Dave > > > > > > > > Sent from my iPhone > > > > > > > >> On Jan 27, 2018, at 3:12 AM, sebb wrote: > > > >> > > > >> A brief search for 'Surf Software' shows quite a few hits. > > > >> I have not looked to see if they would be likely to be confused with > > > >> this project or cause problems for others. > > > >> > > > >> But it as though there might be a problem: > > > >> Surfer - Golden Software > > > >> surf @ sourceforge > > > >> Surf Software company > > > >> > > > >> > > > >>> On 27 January 2018 at 08:03, Byung-Gon Chun > > wrote: > > > >>> Since we cannot use the name Onyx, we would like to change the > > project > > > name > > > >>> to Surf. > > > >>> I hope that this name works. > > > >>> > > > >>> -Gon > > > >>> > > > >>> --- > > > >>> Byung-Gon Chun > > > >>> > > > >>> > > > On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun > > > > wrote: > > > > > > > > > > > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci > > > wrote: > > > > > > > > Great work -- I think this technology has a lot of promise, and > I'd > > > love > > > > to > > > > see its evolution inside the Foundation. > > > > > > > > > > > Thanks, Davor! > > > > > > > > > > Parts of it, like the Onyx Intermediate Representation [1], > overlap > > > with > > > > the work-in-progress inside the Apache Beam project > > ("portability"). > > > We'd > > > > love to work together on this -- would you be open to such > > > collaboration? > > > > If so, it may not be necessary to start from scratch, and > leverage > > > the > > > > work > > > > already done. > > > > > > > > > > > Sure. We're open to collaboration. > > > > > > > > > > Regarding the name, Onyx would likely have to be renamed, due to > a > > > > conflict > > > > with a related technology [2]. > > > > > > > > > > > Thanks for pointing it out. It's difficult to come up with a good > > > short > > > name. :) > > > Do you have any suggestion? > > > > > > Thanks! > > > -Gon > > > > > > --- > > > Byung-Gon Chun > > > > > > > > > > > > > Davor > > > > > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > > > [2] http://www.onyxplatform.org/ > > > > > > > >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun < > bgc...@gmail.com > > > > > > wrote: > > > >> > > > >> Dear Apache Incubator Community, > > > >> > > > >> Please accept the following proposal for presentation and > > > discussion: > > > >> https://wiki.apache.org/incubator/OnyxProposal > > > >> > > > >> Onyx is a data processing system that aims to flexibly control > the > > > > runtime > > > >> behaviors of a job to adapt to varying deployment > characteristics > > > (e.g., > > > >> harnessing transient resources in datacenters, cross-datacenter > > > > deployment, > > > >> changing runtime based on job characteristics, etc.). Onyx > > provides > > > > ways to > > > >> extend the system’s capabilities and incorporate the extensions > to > > > the > > > >> flexible job execution. > > > >> Onyx translates a user program (e.g., Apache Beam, Apache Spark) > > > into an > > > >> Intermediate Representation (IR) DAG, which Onyx optimizes and > > > deploys > > > >> based on a deployment policy. > > > >> > > > >> I've attached the proposal below. > > > >> > > > >> Best regards, > > > >> Byung-Gon Chun > > > >> > > > >> = OnyxProposal = > > > >>>
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thanks, much appreciated ! Regards JB On 01/31/2018 09:50 AM, Byung-Gon Chun wrote: > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré > wrote: > >> Hi, >> >> Coral is a good name ! >> > > Thanks! > > >> >> Does the code belong to Seoul National University ? In that case, in >> addition of >> your ICLA, we would need a SGA (it's not blocker for the project >> bootstrapping >> or code donation, but we, at least, will need it later for graduation). On >> the >> other hand, if the committers are all part on the university, you can also >> sign >> a CCLA. >> > > I will figure this out. > > >> >> Happy to be mentor on the project if you want me ! ;) >> >> > Thanks! I will add you to the mentor list. > > -Gon > > >> Thanks, >> Regards >> JB >> >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: >>> Thanks for the comments, JB! >>> My replies are inlined below. >>> >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré >>> wrote: >>> Hi, sorry to be a little bit late on this. It's a very interesting proposal. It sounds pretty close to the >> portability layer we want to add in Apache Beam. I would love to see interaction between the two communities. I have two minor questions: 1. about the name: Onyx sounds very generic and the name is used in >> other technologies. Maybe another unique name would be more accurate. >>> >>> We proposed Coral instead. How does this sound? >>> >>> 2. the Onyx code is on github right now, under the Apache 2.0 license. Does this code has any affiliation with companies ? Meaning that we would need a >> SGA for the code donation. It does not. The developers are affiliated with Seoul National >> University. >>> In this case, do we still need a SGA? >>> >>> If you need any help for the incubation, I would be more than happy to help ! >>> Thanks for the offer. Would you be interested in being a mentor of the >>> project? >>> >>> Thanks. >>> -Gon >>> >>> >>> Regards JB On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > Dear Apache Incubator Community, > > Please accept the following proposal for presentation and discussion: > https://wiki.apache.org/incubator/OnyxProposal > > Onyx is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics >> (e.g., > harnessing transient resources in datacenters, cross-datacenter deployment, > changing runtime based on job characteristics, etc.). Onyx provides >> ways to > extend the system’s capabilities and incorporate the extensions to the > flexible job execution. > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into >> an > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > based on a deployment policy. > > I've attached the proposal below. > > Best regards, > Byung-Gon Chun > > = OnyxProposal = > > == Abstract == > Onyx is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. > > Onyx is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system’s capabilities and incorporating the extensions to the flexible > job execution. > > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Onyx exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. >
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Sorry for mid-posting. This isn't the list to determine if a project name is suitable. There's a JIRA project dedicated to that, and if you need a quick answer better to email trademarks@ to get a more precise answer. The question is really going to be, is "Apache Onyx" going to be easily confused with something else. John On Sun, Jan 28, 2018 at 4:50 AM Byung-Gon Chun wrote: > Thank you for all the information! It looks like Surf doesn't work. > > If possible, we'd like to keep Onyx. > Another name we came up with is Coral. > > Thanks! > -Gon > > > On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom wrote: > > > Did we rule out Onyx for sure? Just because some other project might use > > it on say github doesn’t necessarily exclude us from having an Apache > Onyx? > > > > FWIW, I agree that surf is too similar in pronunciation to Apache serf. > :) > > > > Cheers, > > > > — Leif > > > > > On Jan 27, 2018, at 07:31, Dave Fisher wrote: > > > > > > Checking “Serf Software” which sounds the same. > > > > > > (1) there is already Apache Serf > > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This > > would definitely confuse as it is apparently comparable to ZooKeeper. > > > > > > Regards, > > > Dave > > > > > > Sent from my iPhone > > > > > >> On Jan 27, 2018, at 3:12 AM, sebb wrote: > > >> > > >> A brief search for 'Surf Software' shows quite a few hits. > > >> I have not looked to see if they would be likely to be confused with > > >> this project or cause problems for others. > > >> > > >> But it as though there might be a problem: > > >> Surfer - Golden Software > > >> surf @ sourceforge > > >> Surf Software company > > >> > > >> > > >>> On 27 January 2018 at 08:03, Byung-Gon Chun > wrote: > > >>> Since we cannot use the name Onyx, we would like to change the > project > > name > > >>> to Surf. > > >>> I hope that this name works. > > >>> > > >>> -Gon > > >>> > > >>> --- > > >>> Byung-Gon Chun > > >>> > > >>> > > On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun > > wrote: > > > > > > > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci > > wrote: > > > > > > Great work -- I think this technology has a lot of promise, and I'd > > love > > > to > > > see its evolution inside the Foundation. > > > > > > > > Thanks, Davor! > > > > > > > Parts of it, like the Onyx Intermediate Representation [1], overlap > > with > > > the work-in-progress inside the Apache Beam project > ("portability"). > > We'd > > > love to work together on this -- would you be open to such > > collaboration? > > > If so, it may not be necessary to start from scratch, and leverage > > the > > > work > > > already done. > > > > > > > > Sure. We're open to collaboration. > > > > > > > Regarding the name, Onyx would likely have to be renamed, due to a > > > conflict > > > with a related technology [2]. > > > > > > > > Thanks for pointing it out. It's difficult to come up with a good > > short > > name. :) > > Do you have any suggestion? > > > > Thanks! > > -Gon > > > > --- > > Byung-Gon Chun > > > > > > > > > Davor > > > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > > [2] http://www.onyxplatform.org/ > > > > > >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun > > > wrote: > > >> > > >> Dear Apache Incubator Community, > > >> > > >> Please accept the following proposal for presentation and > > discussion: > > >> https://wiki.apache.org/incubator/OnyxProposal > > >> > > >> Onyx is a data processing system that aims to flexibly control the > > > runtime > > >> behaviors of a job to adapt to varying deployment characteristics > > (e.g., > > >> harnessing transient resources in datacenters, cross-datacenter > > > deployment, > > >> changing runtime based on job characteristics, etc.). Onyx > provides > > > ways to > > >> extend the system’s capabilities and incorporate the extensions to > > the > > >> flexible job execution. > > >> Onyx translates a user program (e.g., Apache Beam, Apache Spark) > > into an > > >> Intermediate Representation (IR) DAG, which Onyx optimizes and > > deploys > > >> based on a deployment policy. > > >> > > >> I've attached the proposal below. > > >> > > >> Best regards, > > >> Byung-Gon Chun > > >> > > >> = OnyxProposal = > > >> > > >> == Abstract == > > >> Onyx is a data processing system for flexible employment with > > >> different execution scenarios for various deployment > characteristics > > >> on clusters. > > >> > > >> == Proposal == > > >> Today, there is a wide variety of data processing systems with > > >> different designs for better performance and datacenter > efficiency. > > >> They include processing data on specific resour
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré wrote: > Hi, > > Coral is a good name ! > Thanks! > > Does the code belong to Seoul National University ? In that case, in > addition of > your ICLA, we would need a SGA (it's not blocker for the project > bootstrapping > or code donation, but we, at least, will need it later for graduation). On > the > other hand, if the committers are all part on the university, you can also > sign > a CCLA. > I will figure this out. > > Happy to be mentor on the project if you want me ! ;) > > Thanks! I will add you to the mentor list. -Gon > Thanks, > Regards > JB > > On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: > > Thanks for the comments, JB! > > My replies are inlined below. > > > > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré > > wrote: > > > >> Hi, > >> > >> sorry to be a little bit late on this. > >> > >> It's a very interesting proposal. It sounds pretty close to the > portability > >> layer we want to add in Apache Beam. I would love to see interaction > >> between the > >> two communities. > >> > >> I have two minor questions: > >> > >> 1. about the name: Onyx sounds very generic and the name is used in > other > >> technologies. Maybe another unique name would be more accurate. > >> > > > > We proposed Coral instead. How does this sound? > > > > > >> 2. the Onyx code is on github right now, under the Apache 2.0 license. > >> Does this > >> code has any affiliation with companies ? Meaning that we would need a > SGA > >> for > >> the code donation. > >> > >> It does not. The developers are affiliated with Seoul National > University. > > In this case, do we still need a SGA? > > > > > >> If you need any help for the incubation, I would be more than happy to > >> help ! > >> > >> > > Thanks for the offer. Would you be interested in being a mentor of the > > project? > > > > Thanks. > > -Gon > > > > > > > >> Regards > >> JB > >> > >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > >>> Dear Apache Incubator Community, > >>> > >>> Please accept the following proposal for presentation and discussion: > >>> https://wiki.apache.org/incubator/OnyxProposal > >>> > >>> Onyx is a data processing system that aims to flexibly control the > >> runtime > >>> behaviors of a job to adapt to varying deployment characteristics > (e.g., > >>> harnessing transient resources in datacenters, cross-datacenter > >> deployment, > >>> changing runtime based on job characteristics, etc.). Onyx provides > ways > >> to > >>> extend the system’s capabilities and incorporate the extensions to the > >>> flexible job execution. > >>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > an > >>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > >>> based on a deployment policy. > >>> > >>> I've attached the proposal below. > >>> > >>> Best regards, > >>> Byung-Gon Chun > >>> > >>> = OnyxProposal = > >>> > >>> == Abstract == > >>> Onyx is a data processing system for flexible employment with > >>> different execution scenarios for various deployment characteristics > >>> on clusters. > >>> > >>> == Proposal == > >>> Today, there is a wide variety of data processing systems with > >>> different designs for better performance and datacenter efficiency. > >>> They include processing data on specific resource environments and > >>> running jobs with specific attributes. Although each system > >>> successfully solves the problems it targets, most systems are designed > >>> in the way that runtime behaviors are built tightly inside the system > >>> core to hide the complexity of distributed computing. This makes it > >>> hard for a single system to support different deployment > >>> characteristics with different runtime behaviors without substantial > >>> effort. > >>> > >>> Onyx is a data processing system that aims to flexibly control the > >>> runtime behaviors of a job to adapt to varying deployment > >>> characteristics. Moreover, it provides a means of extending the > >>> system’s capabilities and incorporating the extensions to the flexible > >>> job execution. > >>> > >>> In order to be able to easily modify runtime behaviors to adapt to > >>> varying deployment characteristics, Onyx exposes runtime behaviors to > >>> be flexibly configured and modified at both compile-time and runtime > >>> through a set of high-level graph pass interfaces. > >>> > >>> We hope to contribute to the big data processing community by enabling > >>> more flexibility and extensibility in job executions. Furthermore, we > >>> can benefit more together as a community when we work together as a > >>> community to mature the system with more use cases and understanding > >>> of diverse deployment characteristics. The Apache Software Foundation > >>> is the perfect place to achieve these aspirations. > >>> > >>> == Background == > >>> Many data processing systems have distinctive runtime behaviors > >>> optimized and configured for specific deployment c
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Hi, Coral is a good name ! Does the code belong to Seoul National University ? In that case, in addition of your ICLA, we would need a SGA (it's not blocker for the project bootstrapping or code donation, but we, at least, will need it later for graduation). On the other hand, if the committers are all part on the university, you can also sign a CCLA. Happy to be mentor on the project if you want me ! ;) Thanks, Regards JB On 01/30/2018 10:17 AM, Byung-Gon Chun wrote: > Thanks for the comments, JB! > My replies are inlined below. > > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré > wrote: > >> Hi, >> >> sorry to be a little bit late on this. >> >> It's a very interesting proposal. It sounds pretty close to the portability >> layer we want to add in Apache Beam. I would love to see interaction >> between the >> two communities. >> >> I have two minor questions: >> >> 1. about the name: Onyx sounds very generic and the name is used in other >> technologies. Maybe another unique name would be more accurate. >> > > We proposed Coral instead. How does this sound? > > >> 2. the Onyx code is on github right now, under the Apache 2.0 license. >> Does this >> code has any affiliation with companies ? Meaning that we would need a SGA >> for >> the code donation. >> >> It does not. The developers are affiliated with Seoul National University. > In this case, do we still need a SGA? > > >> If you need any help for the incubation, I would be more than happy to >> help ! >> >> > Thanks for the offer. Would you be interested in being a mentor of the > project? > > Thanks. > -Gon > > > >> Regards >> JB >> >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: >>> Dear Apache Incubator Community, >>> >>> Please accept the following proposal for presentation and discussion: >>> https://wiki.apache.org/incubator/OnyxProposal >>> >>> Onyx is a data processing system that aims to flexibly control the >> runtime >>> behaviors of a job to adapt to varying deployment characteristics (e.g., >>> harnessing transient resources in datacenters, cross-datacenter >> deployment, >>> changing runtime based on job characteristics, etc.). Onyx provides ways >> to >>> extend the system’s capabilities and incorporate the extensions to the >>> flexible job execution. >>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an >>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >>> based on a deployment policy. >>> >>> I've attached the proposal below. >>> >>> Best regards, >>> Byung-Gon Chun >>> >>> = OnyxProposal = >>> >>> == Abstract == >>> Onyx is a data processing system for flexible employment with >>> different execution scenarios for various deployment characteristics >>> on clusters. >>> >>> == Proposal == >>> Today, there is a wide variety of data processing systems with >>> different designs for better performance and datacenter efficiency. >>> They include processing data on specific resource environments and >>> running jobs with specific attributes. Although each system >>> successfully solves the problems it targets, most systems are designed >>> in the way that runtime behaviors are built tightly inside the system >>> core to hide the complexity of distributed computing. This makes it >>> hard for a single system to support different deployment >>> characteristics with different runtime behaviors without substantial >>> effort. >>> >>> Onyx is a data processing system that aims to flexibly control the >>> runtime behaviors of a job to adapt to varying deployment >>> characteristics. Moreover, it provides a means of extending the >>> system’s capabilities and incorporating the extensions to the flexible >>> job execution. >>> >>> In order to be able to easily modify runtime behaviors to adapt to >>> varying deployment characteristics, Onyx exposes runtime behaviors to >>> be flexibly configured and modified at both compile-time and runtime >>> through a set of high-level graph pass interfaces. >>> >>> We hope to contribute to the big data processing community by enabling >>> more flexibility and extensibility in job executions. Furthermore, we >>> can benefit more together as a community when we work together as a >>> community to mature the system with more use cases and understanding >>> of diverse deployment characteristics. The Apache Software Foundation >>> is the perfect place to achieve these aspirations. >>> >>> == Background == >>> Many data processing systems have distinctive runtime behaviors >>> optimized and configured for specific deployment characteristics like >>> different resource environments and for handling special job >>> attributes. >>> >>> For example, much research have been conducted to overcome the >>> challenge of running data processing jobs on cheap, unreliable >>> transient resources. Likewise, techniques for disaggregating different >>> types of resources, like memory, CPU and GPU, are being actively >>> developed to use datacenter resources
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
If Coral as our project name is fine, I will start voting in a couple of days. Let me know if you have any concern. Thanks. -Gon On Tue, Jan 30, 2018 at 6:17 PM, Byung-Gon Chun wrote: > Thanks for the comments, JB! > My replies are inlined below. > > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré > wrote: > >> Hi, >> >> sorry to be a little bit late on this. >> >> It's a very interesting proposal. It sounds pretty close to the >> portability >> layer we want to add in Apache Beam. I would love to see interaction >> between the >> two communities. >> >> I have two minor questions: >> >> 1. about the name: Onyx sounds very generic and the name is used in other >> technologies. Maybe another unique name would be more accurate. >> > > We proposed Coral instead. How does this sound? > > >> 2. the Onyx code is on github right now, under the Apache 2.0 license. >> Does this >> code has any affiliation with companies ? Meaning that we would need a >> SGA for >> the code donation. >> >> It does not. The developers are affiliated with Seoul National > University. > In this case, do we still need a SGA? > > >> If you need any help for the incubation, I would be more than happy to >> help ! >> >> > Thanks for the offer. Would you be interested in being a mentor of the > project? > > Thanks. > -Gon > > > >> Regards >> JB >> >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: >> > Dear Apache Incubator Community, >> > >> > Please accept the following proposal for presentation and discussion: >> > https://wiki.apache.org/incubator/OnyxProposal >> > >> > Onyx is a data processing system that aims to flexibly control the >> runtime >> > behaviors of a job to adapt to varying deployment characteristics (e.g., >> > harnessing transient resources in datacenters, cross-datacenter >> deployment, >> > changing runtime based on job characteristics, etc.). Onyx provides >> ways to >> > extend the system’s capabilities and incorporate the extensions to the >> > flexible job execution. >> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an >> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >> > based on a deployment policy. >> > >> > I've attached the proposal below. >> > >> > Best regards, >> > Byung-Gon Chun >> > >> > = OnyxProposal = >> > >> > == Abstract == >> > Onyx is a data processing system for flexible employment with >> > different execution scenarios for various deployment characteristics >> > on clusters. >> > >> > == Proposal == >> > Today, there is a wide variety of data processing systems with >> > different designs for better performance and datacenter efficiency. >> > They include processing data on specific resource environments and >> > running jobs with specific attributes. Although each system >> > successfully solves the problems it targets, most systems are designed >> > in the way that runtime behaviors are built tightly inside the system >> > core to hide the complexity of distributed computing. This makes it >> > hard for a single system to support different deployment >> > characteristics with different runtime behaviors without substantial >> > effort. >> > >> > Onyx is a data processing system that aims to flexibly control the >> > runtime behaviors of a job to adapt to varying deployment >> > characteristics. Moreover, it provides a means of extending the >> > system’s capabilities and incorporating the extensions to the flexible >> > job execution. >> > >> > In order to be able to easily modify runtime behaviors to adapt to >> > varying deployment characteristics, Onyx exposes runtime behaviors to >> > be flexibly configured and modified at both compile-time and runtime >> > through a set of high-level graph pass interfaces. >> > >> > We hope to contribute to the big data processing community by enabling >> > more flexibility and extensibility in job executions. Furthermore, we >> > can benefit more together as a community when we work together as a >> > community to mature the system with more use cases and understanding >> > of diverse deployment characteristics. The Apache Software Foundation >> > is the perfect place to achieve these aspirations. >> > >> > == Background == >> > Many data processing systems have distinctive runtime behaviors >> > optimized and configured for specific deployment characteristics like >> > different resource environments and for handling special job >> > attributes. >> > >> > For example, much research have been conducted to overcome the >> > challenge of running data processing jobs on cheap, unreliable >> > transient resources. Likewise, techniques for disaggregating different >> > types of resources, like memory, CPU and GPU, are being actively >> > developed to use datacenter resources more efficiently. Many >> > researchers are also working to run data processing jobs in even more >> > diverse environments, such as across distant datacenters. Similarly, >> > for special job attributes, many works take differ
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thanks for the comments, JB! My replies are inlined below. On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré wrote: > Hi, > > sorry to be a little bit late on this. > > It's a very interesting proposal. It sounds pretty close to the portability > layer we want to add in Apache Beam. I would love to see interaction > between the > two communities. > > I have two minor questions: > > 1. about the name: Onyx sounds very generic and the name is used in other > technologies. Maybe another unique name would be more accurate. > We proposed Coral instead. How does this sound? > 2. the Onyx code is on github right now, under the Apache 2.0 license. > Does this > code has any affiliation with companies ? Meaning that we would need a SGA > for > the code donation. > > It does not. The developers are affiliated with Seoul National University. In this case, do we still need a SGA? > If you need any help for the incubation, I would be more than happy to > help ! > > Thanks for the offer. Would you be interested in being a mentor of the project? Thanks. -Gon > Regards > JB > > On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > > Dear Apache Incubator Community, > > > > Please accept the following proposal for presentation and discussion: > > https://wiki.apache.org/incubator/OnyxProposal > > > > Onyx is a data processing system that aims to flexibly control the > runtime > > behaviors of a job to adapt to varying deployment characteristics (e.g., > > harnessing transient resources in datacenters, cross-datacenter > deployment, > > changing runtime based on job characteristics, etc.). Onyx provides ways > to > > extend the system’s capabilities and incorporate the extensions to the > > flexible job execution. > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > > based on a deployment policy. > > > > I've attached the proposal below. > > > > Best regards, > > Byung-Gon Chun > > > > = OnyxProposal = > > > > == Abstract == > > Onyx is a data processing system for flexible employment with > > different execution scenarios for various deployment characteristics > > on clusters. > > > > == Proposal == > > Today, there is a wide variety of data processing systems with > > different designs for better performance and datacenter efficiency. > > They include processing data on specific resource environments and > > running jobs with specific attributes. Although each system > > successfully solves the problems it targets, most systems are designed > > in the way that runtime behaviors are built tightly inside the system > > core to hide the complexity of distributed computing. This makes it > > hard for a single system to support different deployment > > characteristics with different runtime behaviors without substantial > > effort. > > > > Onyx is a data processing system that aims to flexibly control the > > runtime behaviors of a job to adapt to varying deployment > > characteristics. Moreover, it provides a means of extending the > > system’s capabilities and incorporating the extensions to the flexible > > job execution. > > > > In order to be able to easily modify runtime behaviors to adapt to > > varying deployment characteristics, Onyx exposes runtime behaviors to > > be flexibly configured and modified at both compile-time and runtime > > through a set of high-level graph pass interfaces. > > > > We hope to contribute to the big data processing community by enabling > > more flexibility and extensibility in job executions. Furthermore, we > > can benefit more together as a community when we work together as a > > community to mature the system with more use cases and understanding > > of diverse deployment characteristics. The Apache Software Foundation > > is the perfect place to achieve these aspirations. > > > > == Background == > > Many data processing systems have distinctive runtime behaviors > > optimized and configured for specific deployment characteristics like > > different resource environments and for handling special job > > attributes. > > > > For example, much research have been conducted to overcome the > > challenge of running data processing jobs on cheap, unreliable > > transient resources. Likewise, techniques for disaggregating different > > types of resources, like memory, CPU and GPU, are being actively > > developed to use datacenter resources more efficiently. Many > > researchers are also working to run data processing jobs in even more > > diverse environments, such as across distant datacenters. Similarly, > > for special job attributes, many works take different approaches, such > > as runtime optimization, to solve problems like data skew, and to > > optimize systems for data processing jobs with small-scale input data. > > > > Although each of the systems performs well with the jobs and in the > > environments they target, they perform poorly with unconsidered cases, > > and do
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Hi, sorry to be a little bit late on this. It's a very interesting proposal. It sounds pretty close to the portability layer we want to add in Apache Beam. I would love to see interaction between the two communities. I have two minor questions: 1. about the name: Onyx sounds very generic and the name is used in other technologies. Maybe another unique name would be more accurate. 2. the Onyx code is on github right now, under the Apache 2.0 license. Does this code has any affiliation with companies ? Meaning that we would need a SGA for the code donation. If you need any help for the incubation, I would be more than happy to help ! Regards JB On 01/26/2018 12:28 AM, Byung-Gon Chun wrote: > Dear Apache Incubator Community, > > Please accept the following proposal for presentation and discussion: > https://wiki.apache.org/incubator/OnyxProposal > > Onyx is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics (e.g., > harnessing transient resources in datacenters, cross-datacenter deployment, > changing runtime based on job characteristics, etc.). Onyx provides ways to > extend the system’s capabilities and incorporate the extensions to the > flexible job execution. > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > based on a deployment policy. > > I've attached the proposal below. > > Best regards, > Byung-Gon Chun > > = OnyxProposal = > > == Abstract == > Onyx is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. > > Onyx is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system’s capabilities and incorporating the extensions to the flexible > job execution. > > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Onyx exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. > > == Background == > Many data processing systems have distinctive runtime behaviors > optimized and configured for specific deployment characteristics like > different resource environments and for handling special job > attributes. > > For example, much research have been conducted to overcome the > challenge of running data processing jobs on cheap, unreliable > transient resources. Likewise, techniques for disaggregating different > types of resources, like memory, CPU and GPU, are being actively > developed to use datacenter resources more efficiently. Many > researchers are also working to run data processing jobs in even more > diverse environments, such as across distant datacenters. Similarly, > for special job attributes, many works take different approaches, such > as runtime optimization, to solve problems like data skew, and to > optimize systems for data processing jobs with small-scale input data. > > Although each of the systems performs well with the jobs and in the > environments they target, they perform poorly with unconsidered cases, > and do not consider supporting multiple deployment characteristics on > a single system in their designs. > > For an application writer to optimize an application to perform well > on a certain system engraved with its underlying behaviors, it > requires a deep understanding of the system itself, which is an > overhead that often requires a lot of time and effort. Moreover, for a > developer to modify such system behaviors, it requires modifications > of the system core, which requires an even deeper understanding of the > system itself. > > With this background, Onyx is designed t
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Thank you for all the information! It looks like Surf doesn't work. If possible, we'd like to keep Onyx. Another name we came up with is Coral. Thanks! -Gon On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom wrote: > Did we rule out Onyx for sure? Just because some other project might use > it on say github doesn’t necessarily exclude us from having an Apache Onyx? > > FWIW, I agree that surf is too similar in pronunciation to Apache serf. :) > > Cheers, > > — Leif > > > On Jan 27, 2018, at 07:31, Dave Fisher wrote: > > > > Checking “Serf Software” which sounds the same. > > > > (1) there is already Apache Serf > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This > would definitely confuse as it is apparently comparable to ZooKeeper. > > > > Regards, > > Dave > > > > Sent from my iPhone > > > >> On Jan 27, 2018, at 3:12 AM, sebb wrote: > >> > >> A brief search for 'Surf Software' shows quite a few hits. > >> I have not looked to see if they would be likely to be confused with > >> this project or cause problems for others. > >> > >> But it as though there might be a problem: > >> Surfer - Golden Software > >> surf @ sourceforge > >> Surf Software company > >> > >> > >>> On 27 January 2018 at 08:03, Byung-Gon Chun wrote: > >>> Since we cannot use the name Onyx, we would like to change the project > name > >>> to Surf. > >>> I hope that this name works. > >>> > >>> -Gon > >>> > >>> --- > >>> Byung-Gon Chun > >>> > >>> > On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun > wrote: > > > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci > wrote: > > > > Great work -- I think this technology has a lot of promise, and I'd > love > > to > > see its evolution inside the Foundation. > > > > > Thanks, Davor! > > > > Parts of it, like the Onyx Intermediate Representation [1], overlap > with > > the work-in-progress inside the Apache Beam project ("portability"). > We'd > > love to work together on this -- would you be open to such > collaboration? > > If so, it may not be necessary to start from scratch, and leverage > the > > work > > already done. > > > > > Sure. We're open to collaboration. > > > > Regarding the name, Onyx would likely have to be renamed, due to a > > conflict > > with a related technology [2]. > > > > > Thanks for pointing it out. It's difficult to come up with a good > short > name. :) > Do you have any suggestion? > > Thanks! > -Gon > > --- > Byung-Gon Chun > > > > > Davor > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > [2] http://www.onyxplatform.org/ > > > >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun > wrote: > >> > >> Dear Apache Incubator Community, > >> > >> Please accept the following proposal for presentation and > discussion: > >> https://wiki.apache.org/incubator/OnyxProposal > >> > >> Onyx is a data processing system that aims to flexibly control the > > runtime > >> behaviors of a job to adapt to varying deployment characteristics > (e.g., > >> harnessing transient resources in datacenters, cross-datacenter > > deployment, > >> changing runtime based on job characteristics, etc.). Onyx provides > > ways to > >> extend the system’s capabilities and incorporate the extensions to > the > >> flexible job execution. > >> Onyx translates a user program (e.g., Apache Beam, Apache Spark) > into an > >> Intermediate Representation (IR) DAG, which Onyx optimizes and > deploys > >> based on a deployment policy. > >> > >> I've attached the proposal below. > >> > >> Best regards, > >> Byung-Gon Chun > >> > >> = OnyxProposal = > >> > >> == Abstract == > >> Onyx is a data processing system for flexible employment with > >> different execution scenarios for various deployment characteristics > >> on clusters. > >> > >> == Proposal == > >> Today, there is a wide variety of data processing systems with > >> different designs for better performance and datacenter efficiency. > >> They include processing data on specific resource environments and > >> running jobs with specific attributes. Although each system > >> successfully solves the problems it targets, most systems are > designed > >> in the way that runtime behaviors are built tightly inside the > system > >> core to hide the complexity of distributed computing. This makes it > >> hard for a single system to support different deployment > >> characteristics with different runtime behaviors without substantial > >> effort. > >> > >> Onyx is a data processing system that aims to flexibly control the > >> runtime behaviors of a job to adapt to varying deployment > >> characteristics. Moreover, it provides a means of extend
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Did we rule out Onyx for sure? Just because some other project might use it on say github doesn’t necessarily exclude us from having an Apache Onyx? FWIW, I agree that surf is too similar in pronunciation to Apache serf. :) Cheers, — Leif > On Jan 27, 2018, at 07:31, Dave Fisher wrote: > > Checking “Serf Software” which sounds the same. > > (1) there is already Apache Serf > (2) Serf is a product from Hashicorp at https://www.serf.io/. This would > definitely confuse as it is apparently comparable to ZooKeeper. > > Regards, > Dave > > Sent from my iPhone > >> On Jan 27, 2018, at 3:12 AM, sebb wrote: >> >> A brief search for 'Surf Software' shows quite a few hits. >> I have not looked to see if they would be likely to be confused with >> this project or cause problems for others. >> >> But it as though there might be a problem: >> Surfer - Golden Software >> surf @ sourceforge >> Surf Software company >> >> >>> On 27 January 2018 at 08:03, Byung-Gon Chun wrote: >>> Since we cannot use the name Onyx, we would like to change the project name >>> to Surf. >>> I hope that this name works. >>> >>> -Gon >>> >>> --- >>> Byung-Gon Chun >>> >>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun wrote: > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > > Great work -- I think this technology has a lot of promise, and I'd love > to > see its evolution inside the Foundation. > > Thanks, Davor! > Parts of it, like the Onyx Intermediate Representation [1], overlap with > the work-in-progress inside the Apache Beam project ("portability"). We'd > love to work together on this -- would you be open to such collaboration? > If so, it may not be necessary to start from scratch, and leverage the > work > already done. > > Sure. We're open to collaboration. > Regarding the name, Onyx would likely have to be renamed, due to a > conflict > with a related technology [2]. > > Thanks for pointing it out. It's difficult to come up with a good short name. :) Do you have any suggestion? Thanks! -Gon --- Byung-Gon Chun > Davor > > [1] https://snuspl.github.io/onyx/docs/ir/ > [2] http://www.onyxplatform.org/ > >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: >> >> Dear Apache Incubator Community, >> >> Please accept the following proposal for presentation and discussion: >> https://wiki.apache.org/incubator/OnyxProposal >> >> Onyx is a data processing system that aims to flexibly control the > runtime >> behaviors of a job to adapt to varying deployment characteristics (e.g., >> harnessing transient resources in datacenters, cross-datacenter > deployment, >> changing runtime based on job characteristics, etc.). Onyx provides > ways to >> extend the system’s capabilities and incorporate the extensions to the >> flexible job execution. >> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an >> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >> based on a deployment policy. >> >> I've attached the proposal below. >> >> Best regards, >> Byung-Gon Chun >> >> = OnyxProposal = >> >> == Abstract == >> Onyx is a data processing system for flexible employment with >> different execution scenarios for various deployment characteristics >> on clusters. >> >> == Proposal == >> Today, there is a wide variety of data processing systems with >> different designs for better performance and datacenter efficiency. >> They include processing data on specific resource environments and >> running jobs with specific attributes. Although each system >> successfully solves the problems it targets, most systems are designed >> in the way that runtime behaviors are built tightly inside the system >> core to hide the complexity of distributed computing. This makes it >> hard for a single system to support different deployment >> characteristics with different runtime behaviors without substantial >> effort. >> >> Onyx is a data processing system that aims to flexibly control the >> runtime behaviors of a job to adapt to varying deployment >> characteristics. Moreover, it provides a means of extending the >> system’s capabilities and incorporating the extensions to the flexible >> job execution. >> >> In order to be able to easily modify runtime behaviors to adapt to >> varying deployment characteristics, Onyx exposes runtime behaviors to >> be flexibly configured and modified at both compile-time and runtime >> through a set of high-level graph pass interfaces. >> >> We hope to contribute to the big data processing community by
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Checking “Serf Software” which sounds the same. (1) there is already Apache Serf (2) Serf is a product from Hashicorp at https://www.serf.io/. This would definitely confuse as it is apparently comparable to ZooKeeper. Regards, Dave Sent from my iPhone > On Jan 27, 2018, at 3:12 AM, sebb wrote: > > A brief search for 'Surf Software' shows quite a few hits. > I have not looked to see if they would be likely to be confused with > this project or cause problems for others. > > But it as though there might be a problem: > Surfer - Golden Software > surf @ sourceforge > Surf Software company > > >> On 27 January 2018 at 08:03, Byung-Gon Chun wrote: >> Since we cannot use the name Onyx, we would like to change the project name >> to Surf. >> I hope that this name works. >> >> -Gon >> >> --- >> Byung-Gon Chun >> >> >>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun wrote: >>> >>> >>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: Great work -- I think this technology has a lot of promise, and I'd love to see its evolution inside the Foundation. >>> Thanks, Davor! >>> >>> Parts of it, like the Onyx Intermediate Representation [1], overlap with the work-in-progress inside the Apache Beam project ("portability"). We'd love to work together on this -- would you be open to such collaboration? If so, it may not be necessary to start from scratch, and leverage the work already done. >>> Sure. We're open to collaboration. >>> >>> Regarding the name, Onyx would likely have to be renamed, due to a conflict with a related technology [2]. >>> Thanks for pointing it out. It's difficult to come up with a good short >>> name. :) >>> Do you have any suggestion? >>> >>> Thanks! >>> -Gon >>> >>> --- >>> Byung-Gon Chun >>> >>> >>> Davor [1] https://snuspl.github.io/onyx/docs/ir/ [2] http://www.onyxplatform.org/ > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: > > Dear Apache Incubator Community, > > Please accept the following proposal for presentation and discussion: > https://wiki.apache.org/incubator/OnyxProposal > > Onyx is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics (e.g., > harnessing transient resources in datacenters, cross-datacenter deployment, > changing runtime based on job characteristics, etc.). Onyx provides ways to > extend the system’s capabilities and incorporate the extensions to the > flexible job execution. > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > based on a deployment policy. > > I've attached the proposal below. > > Best regards, > Byung-Gon Chun > > = OnyxProposal = > > == Abstract == > Onyx is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. > > Onyx is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system’s capabilities and incorporating the extensions to the flexible > job execution. > > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Onyx exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. > > == Background == > Many data processing systems have distinc
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
A brief search for 'Surf Software' shows quite a few hits. I have not looked to see if they would be likely to be confused with this project or cause problems for others. But it as though there might be a problem: Surfer - Golden Software surf @ sourceforge Surf Software company On 27 January 2018 at 08:03, Byung-Gon Chun wrote: > Since we cannot use the name Onyx, we would like to change the project name > to Surf. > I hope that this name works. > > -Gon > > --- > Byung-Gon Chun > > > On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun wrote: > >> >> >> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: >> >>> Great work -- I think this technology has a lot of promise, and I'd love >>> to >>> see its evolution inside the Foundation. >>> >>> >> Thanks, Davor! >> >> >>> Parts of it, like the Onyx Intermediate Representation [1], overlap with >>> the work-in-progress inside the Apache Beam project ("portability"). We'd >>> love to work together on this -- would you be open to such collaboration? >>> If so, it may not be necessary to start from scratch, and leverage the >>> work >>> already done. >>> >>> >> Sure. We're open to collaboration. >> >> >>> Regarding the name, Onyx would likely have to be renamed, due to a >>> conflict >>> with a related technology [2]. >>> >>> >> Thanks for pointing it out. It's difficult to come up with a good short >> name. :) >> Do you have any suggestion? >> >> Thanks! >> -Gon >> >> --- >> Byung-Gon Chun >> >> >> >>> Davor >>> >>> [1] https://snuspl.github.io/onyx/docs/ir/ >>> [2] http://www.onyxplatform.org/ >>> >>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: >>> >>> > Dear Apache Incubator Community, >>> > >>> > Please accept the following proposal for presentation and discussion: >>> > https://wiki.apache.org/incubator/OnyxProposal >>> > >>> > Onyx is a data processing system that aims to flexibly control the >>> runtime >>> > behaviors of a job to adapt to varying deployment characteristics (e.g., >>> > harnessing transient resources in datacenters, cross-datacenter >>> deployment, >>> > changing runtime based on job characteristics, etc.). Onyx provides >>> ways to >>> > extend the system’s capabilities and incorporate the extensions to the >>> > flexible job execution. >>> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an >>> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >>> > based on a deployment policy. >>> > >>> > I've attached the proposal below. >>> > >>> > Best regards, >>> > Byung-Gon Chun >>> > >>> > = OnyxProposal = >>> > >>> > == Abstract == >>> > Onyx is a data processing system for flexible employment with >>> > different execution scenarios for various deployment characteristics >>> > on clusters. >>> > >>> > == Proposal == >>> > Today, there is a wide variety of data processing systems with >>> > different designs for better performance and datacenter efficiency. >>> > They include processing data on specific resource environments and >>> > running jobs with specific attributes. Although each system >>> > successfully solves the problems it targets, most systems are designed >>> > in the way that runtime behaviors are built tightly inside the system >>> > core to hide the complexity of distributed computing. This makes it >>> > hard for a single system to support different deployment >>> > characteristics with different runtime behaviors without substantial >>> > effort. >>> > >>> > Onyx is a data processing system that aims to flexibly control the >>> > runtime behaviors of a job to adapt to varying deployment >>> > characteristics. Moreover, it provides a means of extending the >>> > system’s capabilities and incorporating the extensions to the flexible >>> > job execution. >>> > >>> > In order to be able to easily modify runtime behaviors to adapt to >>> > varying deployment characteristics, Onyx exposes runtime behaviors to >>> > be flexibly configured and modified at both compile-time and runtime >>> > through a set of high-level graph pass interfaces. >>> > >>> > We hope to contribute to the big data processing community by enabling >>> > more flexibility and extensibility in job executions. Furthermore, we >>> > can benefit more together as a community when we work together as a >>> > community to mature the system with more use cases and understanding >>> > of diverse deployment characteristics. The Apache Software Foundation >>> > is the perfect place to achieve these aspirations. >>> > >>> > == Background == >>> > Many data processing systems have distinctive runtime behaviors >>> > optimized and configured for specific deployment characteristics like >>> > different resource environments and for handling special job >>> > attributes. >>> > >>> > For example, much research have been conducted to overcome the >>> > challenge of running data processing jobs on cheap, unreliable >>> > transient resources. Likewise, techniques for disaggregating different >>> > types of resources, l
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Since we cannot use the name Onyx, we would like to change the project name to Surf. I hope that this name works. -Gon --- Byung-Gon Chun On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun wrote: > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > >> Great work -- I think this technology has a lot of promise, and I'd love >> to >> see its evolution inside the Foundation. >> >> > Thanks, Davor! > > >> Parts of it, like the Onyx Intermediate Representation [1], overlap with >> the work-in-progress inside the Apache Beam project ("portability"). We'd >> love to work together on this -- would you be open to such collaboration? >> If so, it may not be necessary to start from scratch, and leverage the >> work >> already done. >> >> > Sure. We're open to collaboration. > > >> Regarding the name, Onyx would likely have to be renamed, due to a >> conflict >> with a related technology [2]. >> >> > Thanks for pointing it out. It's difficult to come up with a good short > name. :) > Do you have any suggestion? > > Thanks! > -Gon > > --- > Byung-Gon Chun > > > >> Davor >> >> [1] https://snuspl.github.io/onyx/docs/ir/ >> [2] http://www.onyxplatform.org/ >> >> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: >> >> > Dear Apache Incubator Community, >> > >> > Please accept the following proposal for presentation and discussion: >> > https://wiki.apache.org/incubator/OnyxProposal >> > >> > Onyx is a data processing system that aims to flexibly control the >> runtime >> > behaviors of a job to adapt to varying deployment characteristics (e.g., >> > harnessing transient resources in datacenters, cross-datacenter >> deployment, >> > changing runtime based on job characteristics, etc.). Onyx provides >> ways to >> > extend the system’s capabilities and incorporate the extensions to the >> > flexible job execution. >> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an >> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys >> > based on a deployment policy. >> > >> > I've attached the proposal below. >> > >> > Best regards, >> > Byung-Gon Chun >> > >> > = OnyxProposal = >> > >> > == Abstract == >> > Onyx is a data processing system for flexible employment with >> > different execution scenarios for various deployment characteristics >> > on clusters. >> > >> > == Proposal == >> > Today, there is a wide variety of data processing systems with >> > different designs for better performance and datacenter efficiency. >> > They include processing data on specific resource environments and >> > running jobs with specific attributes. Although each system >> > successfully solves the problems it targets, most systems are designed >> > in the way that runtime behaviors are built tightly inside the system >> > core to hide the complexity of distributed computing. This makes it >> > hard for a single system to support different deployment >> > characteristics with different runtime behaviors without substantial >> > effort. >> > >> > Onyx is a data processing system that aims to flexibly control the >> > runtime behaviors of a job to adapt to varying deployment >> > characteristics. Moreover, it provides a means of extending the >> > system’s capabilities and incorporating the extensions to the flexible >> > job execution. >> > >> > In order to be able to easily modify runtime behaviors to adapt to >> > varying deployment characteristics, Onyx exposes runtime behaviors to >> > be flexibly configured and modified at both compile-time and runtime >> > through a set of high-level graph pass interfaces. >> > >> > We hope to contribute to the big data processing community by enabling >> > more flexibility and extensibility in job executions. Furthermore, we >> > can benefit more together as a community when we work together as a >> > community to mature the system with more use cases and understanding >> > of diverse deployment characteristics. The Apache Software Foundation >> > is the perfect place to achieve these aspirations. >> > >> > == Background == >> > Many data processing systems have distinctive runtime behaviors >> > optimized and configured for specific deployment characteristics like >> > different resource environments and for handling special job >> > attributes. >> > >> > For example, much research have been conducted to overcome the >> > challenge of running data processing jobs on cheap, unreliable >> > transient resources. Likewise, techniques for disaggregating different >> > types of resources, like memory, CPU and GPU, are being actively >> > developed to use datacenter resources more efficiently. Many >> > researchers are also working to run data processing jobs in even more >> > diverse environments, such as across distant datacenters. Similarly, >> > for special job attributes, many works take different approaches, such >> > as runtime optimization, to solve problems like data skew, and to >> > optimize systems for data processing jobs with small-scale in
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Le 26 janv. 2018 21:53, "Byung-Gon Chun" a écrit : On Sat, Jan 27, 2018 at 5:41 AM, Romain Manni-Bucau wrote: > Why not doing a beam subproject? Any blocker? > > Thanks for the question, Romain. We have a flexible, efficient runtime that supports various user programs (e.g., Beam and Spark programs). We are taking advantage of Beam as a programming layer, but our focus is more on optimizing execution on various deployment scenarios. We also plan to support other programming layers. I tend to think it can converge since beam is about portability and complementary IMHO. Can be worth PoCing. > Otherwise +1 to have it @asf, makes a lot of sense. > > Thanks for the support! -Gon > Le 26 janv. 2018 20:58, "Byung-Gon Chun" a écrit : > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > > > > > Great work -- I think this technology has a lot of promise, and I'd > love > > to > > > see its evolution inside the Foundation. > > > > > > > > Thanks, Davor! > > > > > > > Parts of it, like the Onyx Intermediate Representation [1], overlap > with > > > the work-in-progress inside the Apache Beam project ("portability"). > We'd > > > love to work together on this -- would you be open to such > collaboration? > > > If so, it may not be necessary to start from scratch, and leverage the > > work > > > already done. > > > > > > > > Sure. We're open to collaboration. > > > > > > > Regarding the name, Onyx would likely have to be renamed, due to a > > conflict > > > with a related technology [2]. > > > > > > > > Thanks for pointing it out. It's difficult to come up with a good short > > name. :) > > Do you have any suggestion? > > > > Thanks! > > -Gon > > > > --- > > Byung-Gon Chun > > > > > > > > > Davor > > > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > > [2] http://www.onyxplatform.org/ > > > > > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun > > wrote: > > > > > > > Dear Apache Incubator Community, > > > > > > > > Please accept the following proposal for presentation and discussion: > > > > https://wiki.apache.org/incubator/OnyxProposal > > > > > > > > Onyx is a data processing system that aims to flexibly control the > > > runtime > > > > behaviors of a job to adapt to varying deployment characteristics > > (e.g., > > > > harnessing transient resources in datacenters, cross-datacenter > > > deployment, > > > > changing runtime based on job characteristics, etc.). Onyx provides > > ways > > > to > > > > extend the system’s capabilities and incorporate the extensions to > the > > > > flexible job execution. > > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > > an > > > > Intermediate Representation (IR) DAG, which Onyx optimizes and > deploys > > > > based on a deployment policy. > > > > > > > > I've attached the proposal below. > > > > > > > > Best regards, > > > > Byung-Gon Chun > > > > > > > > = OnyxProposal = > > > > > > > > == Abstract == > > > > Onyx is a data processing system for flexible employment with > > > > different execution scenarios for various deployment characteristics > > > > on clusters. > > > > > > > > == Proposal == > > > > Today, there is a wide variety of data processing systems with > > > > different designs for better performance and datacenter efficiency. > > > > They include processing data on specific resource environments and > > > > running jobs with specific attributes. Although each system > > > > successfully solves the problems it targets, most systems are > designed > > > > in the way that runtime behaviors are built tightly inside the system > > > > core to hide the complexity of distributed computing. This makes it > > > > hard for a single system to support different deployment > > > > characteristics with different runtime behaviors without substantial > > > > effort. > > > > > > > > Onyx is a data processing system that aims to flexibly control the > > > > runtime behaviors of a job to adapt to varying deployment > > > > characteristics. Moreover, it provides a means of extending the > > > > system’s capabilities and incorporating the extensions to the > flexible > > > > job execution. > > > > > > > > In order to be able to easily modify runtime behaviors to adapt to > > > > varying deployment characteristics, Onyx exposes runtime behaviors to > > > > be flexibly configured and modified at both compile-time and runtime > > > > through a set of high-level graph pass interfaces. > > > > > > > > We hope to contribute to the big data processing community by > enabling > > > > more flexibility and extensibility in job executions. Furthermore, we > > > > can benefit more together as a community when we work together as a > > > > community to mature the system with more use cases and understanding > > > > of diverse deployment characteristics. The Apache Software Foundation > > > > is the perfect place to achieve these aspirations. > > > > > > > > == Background == > > > > Many data processing systems have distinctive runtime beha
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
On Sat, Jan 27, 2018 at 5:41 AM, Romain Manni-Bucau wrote: > Why not doing a beam subproject? Any blocker? > > Thanks for the question, Romain. We have a flexible, efficient runtime that supports various user programs (e.g., Beam and Spark programs). We are taking advantage of Beam as a programming layer, but our focus is more on optimizing execution on various deployment scenarios. We also plan to support other programming layers. > Otherwise +1 to have it @asf, makes a lot of sense. > > Thanks for the support! -Gon > Le 26 janv. 2018 20:58, "Byung-Gon Chun" a écrit : > > > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > > > > > Great work -- I think this technology has a lot of promise, and I'd > love > > to > > > see its evolution inside the Foundation. > > > > > > > > Thanks, Davor! > > > > > > > Parts of it, like the Onyx Intermediate Representation [1], overlap > with > > > the work-in-progress inside the Apache Beam project ("portability"). > We'd > > > love to work together on this -- would you be open to such > collaboration? > > > If so, it may not be necessary to start from scratch, and leverage the > > work > > > already done. > > > > > > > > Sure. We're open to collaboration. > > > > > > > Regarding the name, Onyx would likely have to be renamed, due to a > > conflict > > > with a related technology [2]. > > > > > > > > Thanks for pointing it out. It's difficult to come up with a good short > > name. :) > > Do you have any suggestion? > > > > Thanks! > > -Gon > > > > --- > > Byung-Gon Chun > > > > > > > > > Davor > > > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > > [2] http://www.onyxplatform.org/ > > > > > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun > > wrote: > > > > > > > Dear Apache Incubator Community, > > > > > > > > Please accept the following proposal for presentation and discussion: > > > > https://wiki.apache.org/incubator/OnyxProposal > > > > > > > > Onyx is a data processing system that aims to flexibly control the > > > runtime > > > > behaviors of a job to adapt to varying deployment characteristics > > (e.g., > > > > harnessing transient resources in datacenters, cross-datacenter > > > deployment, > > > > changing runtime based on job characteristics, etc.). Onyx provides > > ways > > > to > > > > extend the system’s capabilities and incorporate the extensions to > the > > > > flexible job execution. > > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > > an > > > > Intermediate Representation (IR) DAG, which Onyx optimizes and > deploys > > > > based on a deployment policy. > > > > > > > > I've attached the proposal below. > > > > > > > > Best regards, > > > > Byung-Gon Chun > > > > > > > > = OnyxProposal = > > > > > > > > == Abstract == > > > > Onyx is a data processing system for flexible employment with > > > > different execution scenarios for various deployment characteristics > > > > on clusters. > > > > > > > > == Proposal == > > > > Today, there is a wide variety of data processing systems with > > > > different designs for better performance and datacenter efficiency. > > > > They include processing data on specific resource environments and > > > > running jobs with specific attributes. Although each system > > > > successfully solves the problems it targets, most systems are > designed > > > > in the way that runtime behaviors are built tightly inside the system > > > > core to hide the complexity of distributed computing. This makes it > > > > hard for a single system to support different deployment > > > > characteristics with different runtime behaviors without substantial > > > > effort. > > > > > > > > Onyx is a data processing system that aims to flexibly control the > > > > runtime behaviors of a job to adapt to varying deployment > > > > characteristics. Moreover, it provides a means of extending the > > > > system’s capabilities and incorporating the extensions to the > flexible > > > > job execution. > > > > > > > > In order to be able to easily modify runtime behaviors to adapt to > > > > varying deployment characteristics, Onyx exposes runtime behaviors to > > > > be flexibly configured and modified at both compile-time and runtime > > > > through a set of high-level graph pass interfaces. > > > > > > > > We hope to contribute to the big data processing community by > enabling > > > > more flexibility and extensibility in job executions. Furthermore, we > > > > can benefit more together as a community when we work together as a > > > > community to mature the system with more use cases and understanding > > > > of diverse deployment characteristics. The Apache Software Foundation > > > > is the perfect place to achieve these aspirations. > > > > > > > > == Background == > > > > Many data processing systems have distinctive runtime behaviors > > > > optimized and configured for specific deployment characteristics like > > > > different resource environments and for handling special job > > > > attri
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Why not doing a beam subproject? Any blocker? Otherwise +1 to have it @asf, makes a lot of sense. Le 26 janv. 2018 20:58, "Byung-Gon Chun" a écrit : > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > > > Great work -- I think this technology has a lot of promise, and I'd love > to > > see its evolution inside the Foundation. > > > > > Thanks, Davor! > > > > Parts of it, like the Onyx Intermediate Representation [1], overlap with > > the work-in-progress inside the Apache Beam project ("portability"). We'd > > love to work together on this -- would you be open to such collaboration? > > If so, it may not be necessary to start from scratch, and leverage the > work > > already done. > > > > > Sure. We're open to collaboration. > > > > Regarding the name, Onyx would likely have to be renamed, due to a > conflict > > with a related technology [2]. > > > > > Thanks for pointing it out. It's difficult to come up with a good short > name. :) > Do you have any suggestion? > > Thanks! > -Gon > > --- > Byung-Gon Chun > > > > > Davor > > > > [1] https://snuspl.github.io/onyx/docs/ir/ > > [2] http://www.onyxplatform.org/ > > > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun > wrote: > > > > > Dear Apache Incubator Community, > > > > > > Please accept the following proposal for presentation and discussion: > > > https://wiki.apache.org/incubator/OnyxProposal > > > > > > Onyx is a data processing system that aims to flexibly control the > > runtime > > > behaviors of a job to adapt to varying deployment characteristics > (e.g., > > > harnessing transient resources in datacenters, cross-datacenter > > deployment, > > > changing runtime based on job characteristics, etc.). Onyx provides > ways > > to > > > extend the system’s capabilities and incorporate the extensions to the > > > flexible job execution. > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into > an > > > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > > > based on a deployment policy. > > > > > > I've attached the proposal below. > > > > > > Best regards, > > > Byung-Gon Chun > > > > > > = OnyxProposal = > > > > > > == Abstract == > > > Onyx is a data processing system for flexible employment with > > > different execution scenarios for various deployment characteristics > > > on clusters. > > > > > > == Proposal == > > > Today, there is a wide variety of data processing systems with > > > different designs for better performance and datacenter efficiency. > > > They include processing data on specific resource environments and > > > running jobs with specific attributes. Although each system > > > successfully solves the problems it targets, most systems are designed > > > in the way that runtime behaviors are built tightly inside the system > > > core to hide the complexity of distributed computing. This makes it > > > hard for a single system to support different deployment > > > characteristics with different runtime behaviors without substantial > > > effort. > > > > > > Onyx is a data processing system that aims to flexibly control the > > > runtime behaviors of a job to adapt to varying deployment > > > characteristics. Moreover, it provides a means of extending the > > > system’s capabilities and incorporating the extensions to the flexible > > > job execution. > > > > > > In order to be able to easily modify runtime behaviors to adapt to > > > varying deployment characteristics, Onyx exposes runtime behaviors to > > > be flexibly configured and modified at both compile-time and runtime > > > through a set of high-level graph pass interfaces. > > > > > > We hope to contribute to the big data processing community by enabling > > > more flexibility and extensibility in job executions. Furthermore, we > > > can benefit more together as a community when we work together as a > > > community to mature the system with more use cases and understanding > > > of diverse deployment characteristics. The Apache Software Foundation > > > is the perfect place to achieve these aspirations. > > > > > > == Background == > > > Many data processing systems have distinctive runtime behaviors > > > optimized and configured for specific deployment characteristics like > > > different resource environments and for handling special job > > > attributes. > > > > > > For example, much research have been conducted to overcome the > > > challenge of running data processing jobs on cheap, unreliable > > > transient resources. Likewise, techniques for disaggregating different > > > types of resources, like memory, CPU and GPU, are being actively > > > developed to use datacenter resources more efficiently. Many > > > researchers are also working to run data processing jobs in even more > > > diverse environments, such as across distant datacenters. Similarly, > > > for special job attributes, many works take different approaches, such > > > as runtime optimization, to solve problems like data skew, and to > > > optimize
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci wrote: > Great work -- I think this technology has a lot of promise, and I'd love to > see its evolution inside the Foundation. > > Thanks, Davor! > Parts of it, like the Onyx Intermediate Representation [1], overlap with > the work-in-progress inside the Apache Beam project ("portability"). We'd > love to work together on this -- would you be open to such collaboration? > If so, it may not be necessary to start from scratch, and leverage the work > already done. > > Sure. We're open to collaboration. > Regarding the name, Onyx would likely have to be renamed, due to a conflict > with a related technology [2]. > > Thanks for pointing it out. It's difficult to come up with a good short name. :) Do you have any suggestion? Thanks! -Gon --- Byung-Gon Chun > Davor > > [1] https://snuspl.github.io/onyx/docs/ir/ > [2] http://www.onyxplatform.org/ > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: > > > Dear Apache Incubator Community, > > > > Please accept the following proposal for presentation and discussion: > > https://wiki.apache.org/incubator/OnyxProposal > > > > Onyx is a data processing system that aims to flexibly control the > runtime > > behaviors of a job to adapt to varying deployment characteristics (e.g., > > harnessing transient resources in datacenters, cross-datacenter > deployment, > > changing runtime based on job characteristics, etc.). Onyx provides ways > to > > extend the system’s capabilities and incorporate the extensions to the > > flexible job execution. > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > > based on a deployment policy. > > > > I've attached the proposal below. > > > > Best regards, > > Byung-Gon Chun > > > > = OnyxProposal = > > > > == Abstract == > > Onyx is a data processing system for flexible employment with > > different execution scenarios for various deployment characteristics > > on clusters. > > > > == Proposal == > > Today, there is a wide variety of data processing systems with > > different designs for better performance and datacenter efficiency. > > They include processing data on specific resource environments and > > running jobs with specific attributes. Although each system > > successfully solves the problems it targets, most systems are designed > > in the way that runtime behaviors are built tightly inside the system > > core to hide the complexity of distributed computing. This makes it > > hard for a single system to support different deployment > > characteristics with different runtime behaviors without substantial > > effort. > > > > Onyx is a data processing system that aims to flexibly control the > > runtime behaviors of a job to adapt to varying deployment > > characteristics. Moreover, it provides a means of extending the > > system’s capabilities and incorporating the extensions to the flexible > > job execution. > > > > In order to be able to easily modify runtime behaviors to adapt to > > varying deployment characteristics, Onyx exposes runtime behaviors to > > be flexibly configured and modified at both compile-time and runtime > > through a set of high-level graph pass interfaces. > > > > We hope to contribute to the big data processing community by enabling > > more flexibility and extensibility in job executions. Furthermore, we > > can benefit more together as a community when we work together as a > > community to mature the system with more use cases and understanding > > of diverse deployment characteristics. The Apache Software Foundation > > is the perfect place to achieve these aspirations. > > > > == Background == > > Many data processing systems have distinctive runtime behaviors > > optimized and configured for specific deployment characteristics like > > different resource environments and for handling special job > > attributes. > > > > For example, much research have been conducted to overcome the > > challenge of running data processing jobs on cheap, unreliable > > transient resources. Likewise, techniques for disaggregating different > > types of resources, like memory, CPU and GPU, are being actively > > developed to use datacenter resources more efficiently. Many > > researchers are also working to run data processing jobs in even more > > diverse environments, such as across distant datacenters. Similarly, > > for special job attributes, many works take different approaches, such > > as runtime optimization, to solve problems like data skew, and to > > optimize systems for data processing jobs with small-scale input data. > > > > Although each of the systems performs well with the jobs and in the > > environments they target, they perform poorly with unconsidered cases, > > and do not consider supporting multiple deployment characteristics on > > a single system in their designs. > > > > For an application writer to optimize an application to per
Re: [PROPOSAL] Onyx - proposal for Apache Incubation
Great work -- I think this technology has a lot of promise, and I'd love to see its evolution inside the Foundation. Parts of it, like the Onyx Intermediate Representation [1], overlap with the work-in-progress inside the Apache Beam project ("portability"). We'd love to work together on this -- would you be open to such collaboration? If so, it may not be necessary to start from scratch, and leverage the work already done. Regarding the name, Onyx would likely have to be renamed, due to a conflict with a related technology [2]. Davor [1] https://snuspl.github.io/onyx/docs/ir/ [2] http://www.onyxplatform.org/ On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun wrote: > Dear Apache Incubator Community, > > Please accept the following proposal for presentation and discussion: > https://wiki.apache.org/incubator/OnyxProposal > > Onyx is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics (e.g., > harnessing transient resources in datacenters, cross-datacenter deployment, > changing runtime based on job characteristics, etc.). Onyx provides ways to > extend the system’s capabilities and incorporate the extensions to the > flexible job execution. > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys > based on a deployment policy. > > I've attached the proposal below. > > Best regards, > Byung-Gon Chun > > = OnyxProposal = > > == Abstract == > Onyx is a data processing system for flexible employment with > different execution scenarios for various deployment characteristics > on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with > different designs for better performance and datacenter efficiency. > They include processing data on specific resource environments and > running jobs with specific attributes. Although each system > successfully solves the problems it targets, most systems are designed > in the way that runtime behaviors are built tightly inside the system > core to hide the complexity of distributed computing. This makes it > hard for a single system to support different deployment > characteristics with different runtime behaviors without substantial > effort. > > Onyx is a data processing system that aims to flexibly control the > runtime behaviors of a job to adapt to varying deployment > characteristics. Moreover, it provides a means of extending the > system’s capabilities and incorporating the extensions to the flexible > job execution. > > In order to be able to easily modify runtime behaviors to adapt to > varying deployment characteristics, Onyx exposes runtime behaviors to > be flexibly configured and modified at both compile-time and runtime > through a set of high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling > more flexibility and extensibility in job executions. Furthermore, we > can benefit more together as a community when we work together as a > community to mature the system with more use cases and understanding > of diverse deployment characteristics. The Apache Software Foundation > is the perfect place to achieve these aspirations. > > == Background == > Many data processing systems have distinctive runtime behaviors > optimized and configured for specific deployment characteristics like > different resource environments and for handling special job > attributes. > > For example, much research have been conducted to overcome the > challenge of running data processing jobs on cheap, unreliable > transient resources. Likewise, techniques for disaggregating different > types of resources, like memory, CPU and GPU, are being actively > developed to use datacenter resources more efficiently. Many > researchers are also working to run data processing jobs in even more > diverse environments, such as across distant datacenters. Similarly, > for special job attributes, many works take different approaches, such > as runtime optimization, to solve problems like data skew, and to > optimize systems for data processing jobs with small-scale input data. > > Although each of the systems performs well with the jobs and in the > environments they target, they perform poorly with unconsidered cases, > and do not consider supporting multiple deployment characteristics on > a single system in their designs. > > For an application writer to optimize an application to perform well > on a certain system engraved with its underlying behaviors, it > requires a deep understanding of the system itself, which is an > overhead that often requires a lot of time and effort. Moreover, for a > developer to modify such system behaviors, it requires modifications > of the system core, which requires an even deeper understanding of the > system itself. > > With this background, Onyx is designed to represent all of its jobs as > an Inte
[PROPOSAL] Onyx - proposal for Apache Incubation
Dear Apache Incubator Community, Please accept the following proposal for presentation and discussion: https://wiki.apache.org/incubator/OnyxProposal Onyx is a data processing system that aims to flexibly control the runtime behaviors of a job to adapt to varying deployment characteristics (e.g., harnessing transient resources in datacenters, cross-datacenter deployment, changing runtime based on job characteristics, etc.). Onyx provides ways to extend the system’s capabilities and incorporate the extensions to the flexible job execution. Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an Intermediate Representation (IR) DAG, which Onyx optimizes and deploys based on a deployment policy. I've attached the proposal below. Best regards, Byung-Gon Chun = OnyxProposal = == Abstract == Onyx is a data processing system for flexible employment with different execution scenarios for various deployment characteristics on clusters. == Proposal == Today, there is a wide variety of data processing systems with different designs for better performance and datacenter efficiency. They include processing data on specific resource environments and running jobs with specific attributes. Although each system successfully solves the problems it targets, most systems are designed in the way that runtime behaviors are built tightly inside the system core to hide the complexity of distributed computing. This makes it hard for a single system to support different deployment characteristics with different runtime behaviors without substantial effort. Onyx is a data processing system that aims to flexibly control the runtime behaviors of a job to adapt to varying deployment characteristics. Moreover, it provides a means of extending the system’s capabilities and incorporating the extensions to the flexible job execution. In order to be able to easily modify runtime behaviors to adapt to varying deployment characteristics, Onyx exposes runtime behaviors to be flexibly configured and modified at both compile-time and runtime through a set of high-level graph pass interfaces. We hope to contribute to the big data processing community by enabling more flexibility and extensibility in job executions. Furthermore, we can benefit more together as a community when we work together as a community to mature the system with more use cases and understanding of diverse deployment characteristics. The Apache Software Foundation is the perfect place to achieve these aspirations. == Background == Many data processing systems have distinctive runtime behaviors optimized and configured for specific deployment characteristics like different resource environments and for handling special job attributes. For example, much research have been conducted to overcome the challenge of running data processing jobs on cheap, unreliable transient resources. Likewise, techniques for disaggregating different types of resources, like memory, CPU and GPU, are being actively developed to use datacenter resources more efficiently. Many researchers are also working to run data processing jobs in even more diverse environments, such as across distant datacenters. Similarly, for special job attributes, many works take different approaches, such as runtime optimization, to solve problems like data skew, and to optimize systems for data processing jobs with small-scale input data. Although each of the systems performs well with the jobs and in the environments they target, they perform poorly with unconsidered cases, and do not consider supporting multiple deployment characteristics on a single system in their designs. For an application writer to optimize an application to perform well on a certain system engraved with its underlying behaviors, it requires a deep understanding of the system itself, which is an overhead that often requires a lot of time and effort. Moreover, for a developer to modify such system behaviors, it requires modifications of the system core, which requires an even deeper understanding of the system itself. With this background, Onyx is designed to represent all of its jobs as an Intermediate Representation (IR) DAG. In the Onyx compiler, user applications from various programming models (ex. Apache Beam) are submitted, transformed to an IR DAG, and optimized/customized for the deployment characteristics. In the IR DAG optimization phase, the DAG is modified through a series of compiler “passes” which reshape or annotate the DAG with an expression of the underlying runtime behaviors. The IR DAG is then submitted as an execution plan for the Onyx runtime. The runtime includes the unmodified parts of data processing in the backbone which is transparently integrated with configurable components exposed for further extension. == Rationale == Onyx’s vision lies in providing means for flexibly supporting a wide variety of job execution scenarios for users while facilitating system developers to extend the execution