Re: [DISCUSS] Hop proposal
Hi Jean-Baptiste, Enterprise Integration such as Camel does is a bit of a Hop "blind spot" so it would be very interesting indeed to integrate Camel in Hop. Our architecture certainly allows it since Hop is very much a metadata editor. NiFi is (as far as I can tell) more tied to its own data processing logic. Hop also has one of these 'legacy data engines' as well but like Max mentioned Hop created generic engine plugins to support Apache Beam runners for Apache Spark, Flink and GCP DataFlow to mention a few. Indeed as such it would be in the realm of possibilities to consider a NiFi engine plugin in Hop if there would be any interest. Another cool possibility is the execution of Hop pipelines inside of NiFi (or vice versa) to extend functionality. Specifically for Apache AirFlow we planned to write a workflow engine plugin to support that as well. Whatever may be of all these possibilities, we're looking forward to working with anyone that wants to help out with the blending of these technologies. If anything it should be a lot of fun to do these things. Regards, Matt On Thu, Sep 10, 2020 at 1:48 PM Jean-Baptiste Onofre wrote: > Hi, > > Interesting proposal, and happy to help if needed. > > By the way, did you evaluate the potential relationship with Camel or NIFI > (and what’s the pros/cons if it’s possible to compare with) ? > > Regards > JB > > > Le 8 sept. 2020 à 11:56, Matt Casters > a écrit : > > > > Hello Apache, > > > > Our community is eager to propose for Hop to join the Apache Incubator. > > The Hop Orchestration Platform aims to help people with complex data and > > metadata orchestration problems. > > > > Below is the complete text of the proposal but you can also find it here: > > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > > > > Any help with respect to the incubation is appreciated including help > from > > a few more mentors to set us on the right track. On behalf of my > community > > I'd be happy to answer any questions you might have regarding Hop. Our > > thanks go out to Max, Julian and Tom for helping us set up this proposal. > > > > Thanks in advance for your time! > > > > Best regards, > > > > Matt - Hop co-founder > > www.project-hop.org > > --- > > > > Abstract > > = > > Hop is short for the Hop Orchestration Platform. Written completely in > Java > > it aims to provide a wide range of data orchestration tools, including a > > visual development environment, servers, metadata analysis, auditing > > services and so on. As a platform Hop also wants to be a re-usable > library > > so that it can be easily re-used by other software. > > > > Proposal > > = > > Hop provides all the tools to build, maintain and deploy data > > orchestration, ETL and data integration solutions. For example, Hop > allows > > you to diagram a data flow that propagates changes from a database via > > Apache Kafka to a data warehouse and deploy it as an Apache Beam > pipeline. > > The core concepts of Hop are Pipelines and Workflows. > > * Pipelines do the core data manipulation work (read, manipulate, write > > data). The main items of work in pipelines are transforms. A pipeline > > consists of two or more (usually many) transforms that each perform a > > granular piece of work. The transforms in a pipeline run in parallel, and > > together create a powerful data processing tool. > > * Workflows take care of the orchestration of actions: execute pipelines, > > run child workflows, environment checks, preparation, problem alerting > and > > so on. > > If these terms sound familiar it’s because they are taken from the Apache > > Beam and Apache Airflow projects. > > > > > > The main components of the Hop platform are: > > * hop-gui, a visual data orchestration IDE > > * hop-run: a CLI tool to run workflows or pipelines > > * hop-config: a CLI tool to configure Hop and its components > > * hop-server: a light-weight web server to run and monitor workflows and > > pipelines > > * hop-translator: a tool for translating the various parts of the Hop > tools > > (i18n). > > * hop-web: a thin client version of hop-gui for web browsers and mobile > > devices > > > > > > The cornerstone of the Hop platform is extensibility: all major > components > > of the platform are designed to be pluggable. This allows any possible > > missing functionality to be created in a short amount of time. > > > > Background > > === > > The Hop Orchestration Platform has its origins in the Kettle community. > > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi > > in 2015, the community struck out to solve problems less aligned with > > Hitachi’s interests. > > > > Rationale > > == > > In the Hop community, we have always aimed to function as a meritocracy, > > where contributions are accepted based on merit, and individuals gain > > status in the community based on their contributions (coding and > > otherwise). We’re proud to have a diverse
Re: [DISCUSS] Hop proposal
Hi, Interesting proposal, and happy to help if needed. By the way, did you evaluate the potential relationship with Camel or NIFI (and what’s the pros/cons if it’s possible to compare with) ? Regards JB > Le 8 sept. 2020 à 11:56, Matt Casters > a écrit : > > Hello Apache, > > Our community is eager to propose for Hop to join the Apache Incubator. > The Hop Orchestration Platform aims to help people with complex data and > metadata orchestration problems. > > Below is the complete text of the proposal but you can also find it here: > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > > Any help with respect to the incubation is appreciated including help from > a few more mentors to set us on the right track. On behalf of my community > I'd be happy to answer any questions you might have regarding Hop. Our > thanks go out to Max, Julian and Tom for helping us set up this proposal. > > Thanks in advance for your time! > > Best regards, > > Matt - Hop co-founder > www.project-hop.org > --- > > Abstract > = > Hop is short for the Hop Orchestration Platform. Written completely in Java > it aims to provide a wide range of data orchestration tools, including a > visual development environment, servers, metadata analysis, auditing > services and so on. As a platform Hop also wants to be a re-usable library > so that it can be easily re-used by other software. > > Proposal > = > Hop provides all the tools to build, maintain and deploy data > orchestration, ETL and data integration solutions. For example, Hop allows > you to diagram a data flow that propagates changes from a database via > Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline. > The core concepts of Hop are Pipelines and Workflows. > * Pipelines do the core data manipulation work (read, manipulate, write > data). The main items of work in pipelines are transforms. A pipeline > consists of two or more (usually many) transforms that each perform a > granular piece of work. The transforms in a pipeline run in parallel, and > together create a powerful data processing tool. > * Workflows take care of the orchestration of actions: execute pipelines, > run child workflows, environment checks, preparation, problem alerting and > so on. > If these terms sound familiar it’s because they are taken from the Apache > Beam and Apache Airflow projects. > > > The main components of the Hop platform are: > * hop-gui, a visual data orchestration IDE > * hop-run: a CLI tool to run workflows or pipelines > * hop-config: a CLI tool to configure Hop and its components > * hop-server: a light-weight web server to run and monitor workflows and > pipelines > * hop-translator: a tool for translating the various parts of the Hop tools > (i18n). > * hop-web: a thin client version of hop-gui for web browsers and mobile > devices > > > The cornerstone of the Hop platform is extensibility: all major components > of the platform are designed to be pluggable. This allows any possible > missing functionality to be created in a short amount of time. > > Background > === > The Hop Orchestration Platform has its origins in the Kettle community. > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi > in 2015, the community struck out to solve problems less aligned with > Hitachi’s interests. > > Rationale > == > In the Hop community, we have always aimed to function as a meritocracy, > where contributions are accepted based on merit, and individuals gain > status in the community based on their contributions (coding and > otherwise). We’re proud to have a diverse group of people doing all the > required things in a project: development , documentation, tutorials, > architecture, testing, graphics design and much more. Bringing the project > under the Apache Software Foundation would allow us to continue and grow, > but also give our users confidence about the governance, IP status, and > future of the project. > > ASF Preparation Phase > == > The very first goal of project Hop is to find a good way to cooperate on > the development across wide geographical, economical and social spectra. To > make this possible real changes were needed to a codebase which is > essentially 20 years old. Most of these changes have been tackled by now. > We think it’s fair to say that by now, Hop is a new platform even though it > shares a common background as it partly started from the Kettle code base. > Here are a few of the key focus areas we’re trying to saveguard going > forward: > * Plugins: lightweight plugins for all major functionality. This makes it > possible to extend Hop or reduce Hop in size. It also allows people to > implement or change functionality with minimal coding. In other words it > makes it easier to contribute. > * Maintain an open and responsive community where every concern, feedback > and contribution is welcome. > * Maintain a clear focus
Re: [DISCUSS] Hop proposal
Hi, +1 The project seems to be very interesting and we can see that there is documentation, contribution guide... I will be more than happy to help as a mentor. regards, François fpa...@apache.org Le 10/09/2020 à 13:05, Julian Feinauer a écrit : > Hey, > > thanks for your statement Max and thats already a great start as we coannot > expect fresh podlings to know the apache way (at all?) as then there would be > no point for the incubator. > But knowing you and your motivation and reading your statement about the team > makes me very confident that this could be a very smooth ride : ) > > So, best from my side! > > Julian > > Am 10.09.20, 12:40 schrieb "Maximilian Michels" : > > I've met Matt and other folks from the Hop project more than a year ago > through Beam Summit Europe. I can say that they are genuinely passionate > about open-source. Initially, they were not familiar with the Apache > Way, but throughout the past year, everyone has ramped up their > knowledge about the ASF. You will also see that reflected in the proposal. > > Hop is a great project in the sense that it adds GUI-based integration > to many data processing projects at Apache. This is appealing to me > because we are leveraging many of the existing projects such as Spark, > Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great > addition to the Apache project portfolio. > > This is going to be my first project as a Champion and I'm very much > looking forward to guiding the project throughout the incubation process. > > Please post your questions or let us know if you want to help with > mentoring the project. > > -Max > > On 08.09.20 12:30, Matt Casters wrote: > > Thank you very much Kevin! > > > > On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera > > > wrote: > > > >> +1 ( binding ) Interesting project. Please add me as a mentor to the > >> project. > >> > >> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters > >> wrote: > >> > >>> Hello Apache, > >>> > >>> Our community is eager to propose for Hop to join the Apache > Incubator. > >>> The Hop Orchestration Platform aims to help people with complex data > and > >>> metadata orchestration problems. > >>> > >>> Below is the complete text of the proposal but you can also find it > here: > >>> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > >>> > >>> Any help with respect to the incubation is appreciated including help > >> from > >>> a few more mentors to set us on the right track. On behalf of my > >> community > >>> I'd be happy to answer any questions you might have regarding Hop. > Our > >>> thanks go out to Max, Julian and Tom for helping us set up this > proposal. > >>> > >>> Thanks in advance for your time! > >>> > >>> Best regards, > >>> > >>> Matt - Hop co-founder > >>> www.project-hop.org > >>> --- > >>> > >>> Abstract > >>> = > >>> Hop is short for the Hop Orchestration Platform. Written completely in > >> Java > >>> it aims to provide a wide range of data orchestration tools, > including a > >>> visual development environment, servers, metadata analysis, auditing > >>> services and so on. As a platform Hop also wants to be a re-usable > >> library > >>> so that it can be easily re-used by other software. > >>> > >>> Proposal > >>> = > >>> Hop provides all the tools to build, maintain and deploy data > >>> orchestration, ETL and data integration solutions. For example, Hop > >> allows > >>> you to diagram a data flow that propagates changes from a database via > >>> Apache Kafka to a data warehouse and deploy it as an Apache Beam > >> pipeline. > >>> The core concepts of Hop are Pipelines and Workflows. > >>> * Pipelines do the core data manipulation work (read, manipulate, > write > >>> data). The main items of work in pipelines are transforms. A pipeline > >>> consists of two or more (usually many) transforms that each perform a > >>> granular piece of work. The transforms in a pipeline run in parallel, > and > >>> together create a powerful data processing tool. > >>> * Workflows take care of the orchestration of actions: execute > pipelines, > >>> run child workflows, environment checks, preparation, problem alerting > >> and > >>> so on. > >>> If these terms sound familiar it’s because they are taken from the > Apache > >>> Beam and Apache Airflow projects. > >>> > >>> > >>> The main components of the Hop platform are: > >>> * hop-gui, a visual data orchestration IDE > >>> * hop-run: a CLI tool to run workflows or pipelines > >>> * hop-config: a CLI tool to configure Hop and its components > >>> * hop-server: a light-weight web server to run
Re: [DISCUSS] Hop proposal
Hey, thanks for your statement Max and thats already a great start as we coannot expect fresh podlings to know the apache way (at all?) as then there would be no point for the incubator. But knowing you and your motivation and reading your statement about the team makes me very confident that this could be a very smooth ride : ) So, best from my side! Julian Am 10.09.20, 12:40 schrieb "Maximilian Michels" : I've met Matt and other folks from the Hop project more than a year ago through Beam Summit Europe. I can say that they are genuinely passionate about open-source. Initially, they were not familiar with the Apache Way, but throughout the past year, everyone has ramped up their knowledge about the ASF. You will also see that reflected in the proposal. Hop is a great project in the sense that it adds GUI-based integration to many data processing projects at Apache. This is appealing to me because we are leveraging many of the existing projects such as Spark, Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great addition to the Apache project portfolio. This is going to be my first project as a Champion and I'm very much looking forward to guiding the project throughout the incubation process. Please post your questions or let us know if you want to help with mentoring the project. -Max On 08.09.20 12:30, Matt Casters wrote: > Thank you very much Kevin! > > On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera > wrote: > >> +1 ( binding ) Interesting project. Please add me as a mentor to the >> project. >> >> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters >> wrote: >> >>> Hello Apache, >>> >>> Our community is eager to propose for Hop to join the Apache Incubator. >>> The Hop Orchestration Platform aims to help people with complex data and >>> metadata orchestration problems. >>> >>> Below is the complete text of the proposal but you can also find it here: >>> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal >>> >>> Any help with respect to the incubation is appreciated including help >> from >>> a few more mentors to set us on the right track. On behalf of my >> community >>> I'd be happy to answer any questions you might have regarding Hop. Our >>> thanks go out to Max, Julian and Tom for helping us set up this proposal. >>> >>> Thanks in advance for your time! >>> >>> Best regards, >>> >>> Matt - Hop co-founder >>> www.project-hop.org >>> --- >>> >>> Abstract >>> = >>> Hop is short for the Hop Orchestration Platform. Written completely in >> Java >>> it aims to provide a wide range of data orchestration tools, including a >>> visual development environment, servers, metadata analysis, auditing >>> services and so on. As a platform Hop also wants to be a re-usable >> library >>> so that it can be easily re-used by other software. >>> >>> Proposal >>> = >>> Hop provides all the tools to build, maintain and deploy data >>> orchestration, ETL and data integration solutions. For example, Hop >> allows >>> you to diagram a data flow that propagates changes from a database via >>> Apache Kafka to a data warehouse and deploy it as an Apache Beam >> pipeline. >>> The core concepts of Hop are Pipelines and Workflows. >>> * Pipelines do the core data manipulation work (read, manipulate, write >>> data). The main items of work in pipelines are transforms. A pipeline >>> consists of two or more (usually many) transforms that each perform a >>> granular piece of work. The transforms in a pipeline run in parallel, and >>> together create a powerful data processing tool. >>> * Workflows take care of the orchestration of actions: execute pipelines, >>> run child workflows, environment checks, preparation, problem alerting >> and >>> so on. >>> If these terms sound familiar it’s because they are taken from the Apache >>> Beam and Apache Airflow projects. >>> >>> >>> The main components of the Hop platform are: >>> * hop-gui, a visual data orchestration IDE >>> * hop-run: a CLI tool to run workflows or pipelines >>> * hop-config: a CLI tool to configure Hop and its components >>> * hop-server: a light-weight web server to run and monitor workflows and >>> pipelines >>> * hop-translator: a tool for translating the various parts of the Hop >> tools >>> (i18n). >>> * hop-web: a thin client version of hop-gui for web browsers and mobile >>> devices >>> >>> >>> The cornerstone of the Hop platform is extensibility: all major >> components >>> of the platform are designed to be pluggable. This allows any possible >>> missing functionality to be created in a
Re: [DISCUSS] Hop proposal
I've met Matt and other folks from the Hop project more than a year ago through Beam Summit Europe. I can say that they are genuinely passionate about open-source. Initially, they were not familiar with the Apache Way, but throughout the past year, everyone has ramped up their knowledge about the ASF. You will also see that reflected in the proposal. Hop is a great project in the sense that it adds GUI-based integration to many data processing projects at Apache. This is appealing to me because we are leveraging many of the existing projects such as Spark, Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great addition to the Apache project portfolio. This is going to be my first project as a Champion and I'm very much looking forward to guiding the project throughout the incubation process. Please post your questions or let us know if you want to help with mentoring the project. -Max On 08.09.20 12:30, Matt Casters wrote: Thank you very much Kevin! On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera wrote: +1 ( binding ) Interesting project. Please add me as a mentor to the project. On Tue, Sep 8, 2020 at 3:26 PM Matt Casters wrote: Hello Apache, Our community is eager to propose for Hop to join the Apache Incubator. The Hop Orchestration Platform aims to help people with complex data and metadata orchestration problems. Below is the complete text of the proposal but you can also find it here: https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal Any help with respect to the incubation is appreciated including help from a few more mentors to set us on the right track. On behalf of my community I'd be happy to answer any questions you might have regarding Hop. Our thanks go out to Max, Julian and Tom for helping us set up this proposal. Thanks in advance for your time! Best regards, Matt - Hop co-founder www.project-hop.org --- Abstract = Hop is short for the Hop Orchestration Platform. Written completely in Java it aims to provide a wide range of data orchestration tools, including a visual development environment, servers, metadata analysis, auditing services and so on. As a platform Hop also wants to be a re-usable library so that it can be easily re-used by other software. Proposal = Hop provides all the tools to build, maintain and deploy data orchestration, ETL and data integration solutions. For example, Hop allows you to diagram a data flow that propagates changes from a database via Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline. The core concepts of Hop are Pipelines and Workflows. * Pipelines do the core data manipulation work (read, manipulate, write data). The main items of work in pipelines are transforms. A pipeline consists of two or more (usually many) transforms that each perform a granular piece of work. The transforms in a pipeline run in parallel, and together create a powerful data processing tool. * Workflows take care of the orchestration of actions: execute pipelines, run child workflows, environment checks, preparation, problem alerting and so on. If these terms sound familiar it’s because they are taken from the Apache Beam and Apache Airflow projects. The main components of the Hop platform are: * hop-gui, a visual data orchestration IDE * hop-run: a CLI tool to run workflows or pipelines * hop-config: a CLI tool to configure Hop and its components * hop-server: a light-weight web server to run and monitor workflows and pipelines * hop-translator: a tool for translating the various parts of the Hop tools (i18n). * hop-web: a thin client version of hop-gui for web browsers and mobile devices The cornerstone of the Hop platform is extensibility: all major components of the platform are designed to be pluggable. This allows any possible missing functionality to be created in a short amount of time. Background === The Hop Orchestration Platform has its origins in the Kettle community. Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi in 2015, the community struck out to solve problems less aligned with Hitachi’s interests. Rationale == In the Hop community, we have always aimed to function as a meritocracy, where contributions are accepted based on merit, and individuals gain status in the community based on their contributions (coding and otherwise). We’re proud to have a diverse group of people doing all the required things in a project: development , documentation, tutorials, architecture, testing, graphics design and much more. Bringing the project under the Apache Software Foundation would allow us to continue and grow, but also give our users confidence about the governance, IP status, and future of the project. ASF Preparation Phase == The very first goal of project Hop is to find a good way to cooperate on the development across wide geographical, economical and social spectra. To make
Re: [DISCUSS] Hop proposal
Thank you very much Kevin! On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera wrote: > +1 ( binding ) Interesting project. Please add me as a mentor to the > project. > > On Tue, Sep 8, 2020 at 3:26 PM Matt Casters > wrote: > > > Hello Apache, > > > > Our community is eager to propose for Hop to join the Apache Incubator. > > The Hop Orchestration Platform aims to help people with complex data and > > metadata orchestration problems. > > > > Below is the complete text of the proposal but you can also find it here: > > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > > > > Any help with respect to the incubation is appreciated including help > from > > a few more mentors to set us on the right track. On behalf of my > community > > I'd be happy to answer any questions you might have regarding Hop. Our > > thanks go out to Max, Julian and Tom for helping us set up this proposal. > > > > Thanks in advance for your time! > > > > Best regards, > > > > Matt - Hop co-founder > > www.project-hop.org > > --- > > > > Abstract > > = > > Hop is short for the Hop Orchestration Platform. Written completely in > Java > > it aims to provide a wide range of data orchestration tools, including a > > visual development environment, servers, metadata analysis, auditing > > services and so on. As a platform Hop also wants to be a re-usable > library > > so that it can be easily re-used by other software. > > > > Proposal > > = > > Hop provides all the tools to build, maintain and deploy data > > orchestration, ETL and data integration solutions. For example, Hop > allows > > you to diagram a data flow that propagates changes from a database via > > Apache Kafka to a data warehouse and deploy it as an Apache Beam > pipeline. > > The core concepts of Hop are Pipelines and Workflows. > > * Pipelines do the core data manipulation work (read, manipulate, write > > data). The main items of work in pipelines are transforms. A pipeline > > consists of two or more (usually many) transforms that each perform a > > granular piece of work. The transforms in a pipeline run in parallel, and > > together create a powerful data processing tool. > > * Workflows take care of the orchestration of actions: execute pipelines, > > run child workflows, environment checks, preparation, problem alerting > and > > so on. > > If these terms sound familiar it’s because they are taken from the Apache > > Beam and Apache Airflow projects. > > > > > > The main components of the Hop platform are: > > * hop-gui, a visual data orchestration IDE > > * hop-run: a CLI tool to run workflows or pipelines > > * hop-config: a CLI tool to configure Hop and its components > > * hop-server: a light-weight web server to run and monitor workflows and > > pipelines > > * hop-translator: a tool for translating the various parts of the Hop > tools > > (i18n). > > * hop-web: a thin client version of hop-gui for web browsers and mobile > > devices > > > > > > The cornerstone of the Hop platform is extensibility: all major > components > > of the platform are designed to be pluggable. This allows any possible > > missing functionality to be created in a short amount of time. > > > > Background > > === > > The Hop Orchestration Platform has its origins in the Kettle community. > > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi > > in 2015, the community struck out to solve problems less aligned with > > Hitachi’s interests. > > > > Rationale > > == > > In the Hop community, we have always aimed to function as a meritocracy, > > where contributions are accepted based on merit, and individuals gain > > status in the community based on their contributions (coding and > > otherwise). We’re proud to have a diverse group of people doing all the > > required things in a project: development , documentation, tutorials, > > architecture, testing, graphics design and much more. Bringing the > project > > under the Apache Software Foundation would allow us to continue and grow, > > but also give our users confidence about the governance, IP status, and > > future of the project. > > > > ASF Preparation Phase > > == > > The very first goal of project Hop is to find a good way to cooperate on > > the development across wide geographical, economical and social spectra. > To > > make this possible real changes were needed to a codebase which is > > essentially 20 years old. Most of these changes have been tackled by now. > > We think it’s fair to say that by now, Hop is a new platform even though > it > > shares a common background as it partly started from the Kettle code > base. > > Here are a few of the key focus areas we’re trying to saveguard going > > forward: > > * Plugins: lightweight plugins for all major functionality. This makes it > > possible to extend Hop or reduce Hop in size. It also allows people to > > implement or change functionality with minimal coding. In other words
Re: [DISCUSS] Hop proposal
+1 ( binding ) Interesting project. Please add me as a mentor to the project. On Tue, Sep 8, 2020 at 3:26 PM Matt Casters wrote: > Hello Apache, > > Our community is eager to propose for Hop to join the Apache Incubator. > The Hop Orchestration Platform aims to help people with complex data and > metadata orchestration problems. > > Below is the complete text of the proposal but you can also find it here: > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal > > Any help with respect to the incubation is appreciated including help from > a few more mentors to set us on the right track. On behalf of my community > I'd be happy to answer any questions you might have regarding Hop. Our > thanks go out to Max, Julian and Tom for helping us set up this proposal. > > Thanks in advance for your time! > > Best regards, > > Matt - Hop co-founder > www.project-hop.org > --- > > Abstract > = > Hop is short for the Hop Orchestration Platform. Written completely in Java > it aims to provide a wide range of data orchestration tools, including a > visual development environment, servers, metadata analysis, auditing > services and so on. As a platform Hop also wants to be a re-usable library > so that it can be easily re-used by other software. > > Proposal > = > Hop provides all the tools to build, maintain and deploy data > orchestration, ETL and data integration solutions. For example, Hop allows > you to diagram a data flow that propagates changes from a database via > Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline. > The core concepts of Hop are Pipelines and Workflows. > * Pipelines do the core data manipulation work (read, manipulate, write > data). The main items of work in pipelines are transforms. A pipeline > consists of two or more (usually many) transforms that each perform a > granular piece of work. The transforms in a pipeline run in parallel, and > together create a powerful data processing tool. > * Workflows take care of the orchestration of actions: execute pipelines, > run child workflows, environment checks, preparation, problem alerting and > so on. > If these terms sound familiar it’s because they are taken from the Apache > Beam and Apache Airflow projects. > > > The main components of the Hop platform are: > * hop-gui, a visual data orchestration IDE > * hop-run: a CLI tool to run workflows or pipelines > * hop-config: a CLI tool to configure Hop and its components > * hop-server: a light-weight web server to run and monitor workflows and > pipelines > * hop-translator: a tool for translating the various parts of the Hop tools > (i18n). > * hop-web: a thin client version of hop-gui for web browsers and mobile > devices > > > The cornerstone of the Hop platform is extensibility: all major components > of the platform are designed to be pluggable. This allows any possible > missing functionality to be created in a short amount of time. > > Background > === > The Hop Orchestration Platform has its origins in the Kettle community. > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi > in 2015, the community struck out to solve problems less aligned with > Hitachi’s interests. > > Rationale > == > In the Hop community, we have always aimed to function as a meritocracy, > where contributions are accepted based on merit, and individuals gain > status in the community based on their contributions (coding and > otherwise). We’re proud to have a diverse group of people doing all the > required things in a project: development , documentation, tutorials, > architecture, testing, graphics design and much more. Bringing the project > under the Apache Software Foundation would allow us to continue and grow, > but also give our users confidence about the governance, IP status, and > future of the project. > > ASF Preparation Phase > == > The very first goal of project Hop is to find a good way to cooperate on > the development across wide geographical, economical and social spectra. To > make this possible real changes were needed to a codebase which is > essentially 20 years old. Most of these changes have been tackled by now. > We think it’s fair to say that by now, Hop is a new platform even though it > shares a common background as it partly started from the Kettle code base. > Here are a few of the key focus areas we’re trying to saveguard going > forward: > * Plugins: lightweight plugins for all major functionality. This makes it > possible to extend Hop or reduce Hop in size. It also allows people to > implement or change functionality with minimal coding. In other words it > makes it easier to contribute. > * Maintain an open and responsive community where every concern, feedback > and contribution is welcome. > * Maintain a clear focus on data orchestration user requirements, not on > “industry trends” > * Documentation: we set up a version controlled “adoc” system with >
[DISCUSS] Hop proposal
Hello Apache, Our community is eager to propose for Hop to join the Apache Incubator. The Hop Orchestration Platform aims to help people with complex data and metadata orchestration problems. Below is the complete text of the proposal but you can also find it here: https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal Any help with respect to the incubation is appreciated including help from a few more mentors to set us on the right track. On behalf of my community I'd be happy to answer any questions you might have regarding Hop. Our thanks go out to Max, Julian and Tom for helping us set up this proposal. Thanks in advance for your time! Best regards, Matt - Hop co-founder www.project-hop.org --- Abstract = Hop is short for the Hop Orchestration Platform. Written completely in Java it aims to provide a wide range of data orchestration tools, including a visual development environment, servers, metadata analysis, auditing services and so on. As a platform Hop also wants to be a re-usable library so that it can be easily re-used by other software. Proposal = Hop provides all the tools to build, maintain and deploy data orchestration, ETL and data integration solutions. For example, Hop allows you to diagram a data flow that propagates changes from a database via Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline. The core concepts of Hop are Pipelines and Workflows. * Pipelines do the core data manipulation work (read, manipulate, write data). The main items of work in pipelines are transforms. A pipeline consists of two or more (usually many) transforms that each perform a granular piece of work. The transforms in a pipeline run in parallel, and together create a powerful data processing tool. * Workflows take care of the orchestration of actions: execute pipelines, run child workflows, environment checks, preparation, problem alerting and so on. If these terms sound familiar it’s because they are taken from the Apache Beam and Apache Airflow projects. The main components of the Hop platform are: * hop-gui, a visual data orchestration IDE * hop-run: a CLI tool to run workflows or pipelines * hop-config: a CLI tool to configure Hop and its components * hop-server: a light-weight web server to run and monitor workflows and pipelines * hop-translator: a tool for translating the various parts of the Hop tools (i18n). * hop-web: a thin client version of hop-gui for web browsers and mobile devices The cornerstone of the Hop platform is extensibility: all major components of the platform are designed to be pluggable. This allows any possible missing functionality to be created in a short amount of time. Background === The Hop Orchestration Platform has its origins in the Kettle community. Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi in 2015, the community struck out to solve problems less aligned with Hitachi’s interests. Rationale == In the Hop community, we have always aimed to function as a meritocracy, where contributions are accepted based on merit, and individuals gain status in the community based on their contributions (coding and otherwise). We’re proud to have a diverse group of people doing all the required things in a project: development , documentation, tutorials, architecture, testing, graphics design and much more. Bringing the project under the Apache Software Foundation would allow us to continue and grow, but also give our users confidence about the governance, IP status, and future of the project. ASF Preparation Phase == The very first goal of project Hop is to find a good way to cooperate on the development across wide geographical, economical and social spectra. To make this possible real changes were needed to a codebase which is essentially 20 years old. Most of these changes have been tackled by now. We think it’s fair to say that by now, Hop is a new platform even though it shares a common background as it partly started from the Kettle code base. Here are a few of the key focus areas we’re trying to saveguard going forward: * Plugins: lightweight plugins for all major functionality. This makes it possible to extend Hop or reduce Hop in size. It also allows people to implement or change functionality with minimal coding. In other words it makes it easier to contribute. * Maintain an open and responsive community where every concern, feedback and contribution is welcome. * Maintain a clear focus on data orchestration user requirements, not on “industry trends” * Documentation: we set up a version controlled “adoc” system with automated builds which is both open, controlled and reviewed. This is incredibly important for every Hop user and developer. * Testing and stability: we want to massively increase stability by implementing integration tests beyond the standard Java unit testing because of the dynamic nature of data orchestration work. We still