Re: [DISCUSS] Hop proposal

2020-09-10 Thread Matt Casters
Hi Jean-Baptiste,

Enterprise Integration such as Camel does is a bit of a Hop "blind spot" so
it would be very interesting indeed to integrate Camel in Hop.  Our
architecture certainly allows it since Hop is very much a metadata editor.
NiFi is (as far as I can tell) more tied to its own data processing logic.
Hop also has one of these 'legacy data engines' as well but like Max
mentioned Hop created generic engine plugins to support Apache Beam runners
for Apache Spark, Flink and GCP DataFlow to mention a few.  Indeed as such
it would be in the realm of possibilities to consider a NiFi engine plugin
in Hop if there would be any interest.  Another cool possibility is the
execution of Hop pipelines inside of NiFi (or vice versa) to extend
functionality. Specifically for Apache AirFlow we planned to write a
workflow engine plugin to support that as well.

Whatever may be of all these possibilities, we're looking forward to
working with anyone that wants to help out with the blending of these
technologies.  If anything it should be a lot of fun to do these things.

Regards,
Matt


On Thu, Sep 10, 2020 at 1:48 PM Jean-Baptiste Onofre 
wrote:

> Hi,
>
> Interesting proposal, and happy to help if needed.
>
> By the way, did you evaluate the potential relationship with Camel or NIFI
> (and what’s the pros/cons if it’s possible to compare with) ?
>
> Regards
> JB
>
> > Le 8 sept. 2020 à 11:56, Matt Casters
>  a écrit :
> >
> > Hello Apache,
> >
> > Our community is eager to propose for Hop to join the Apache Incubator.
> > The Hop Orchestration Platform aims to help people with complex data and
> > metadata orchestration problems.
> >
> > Below is the complete text of the proposal but you can also find it here:
> > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> >
> > Any help with respect to the incubation is appreciated including help
> from
> > a few more mentors to set us on the right track.  On behalf of my
> community
> > I'd be happy to answer any questions you might have regarding Hop.  Our
> > thanks go out to Max, Julian and Tom for helping us set up this proposal.
> >
> > Thanks in advance for your time!
> >
> > Best regards,
> >
> > Matt - Hop co-founder
> > www.project-hop.org
> > ---
> >
> > Abstract
> > =
> > Hop is short for the Hop Orchestration Platform. Written completely in
> Java
> > it aims to provide a wide range of data orchestration tools, including a
> > visual development environment, servers, metadata analysis, auditing
> > services and so on. As a platform Hop also wants to be a re-usable
> library
> > so that it can be easily re-used by other software.
> >
> > Proposal
> > =
> > Hop provides all the tools to build, maintain and deploy data
> > orchestration, ETL and data integration solutions. For example, Hop
> allows
> > you to diagram a data flow that propagates changes from a database via
> > Apache Kafka to a data warehouse and deploy it as an Apache Beam
> pipeline.
> > The core concepts of Hop are Pipelines and Workflows.
> > * Pipelines do the core data manipulation work (read, manipulate, write
> > data). The main items of work in pipelines are transforms. A pipeline
> > consists of two or more (usually many) transforms that each perform a
> > granular piece of work. The transforms in a pipeline run in parallel, and
> > together create a powerful data processing tool.
> > * Workflows take care of the orchestration of actions: execute pipelines,
> > run child workflows, environment checks, preparation, problem alerting
> and
> > so on.
> > If these terms sound familiar it’s because they are taken from the Apache
> > Beam and Apache Airflow projects.
> >
> >
> > The main components of the Hop platform are:
> > * hop-gui, a visual data orchestration IDE
> > * hop-run: a CLI tool to run workflows or pipelines
> > * hop-config: a CLI tool to configure Hop and its components
> > * hop-server: a light-weight web server to run and monitor workflows and
> > pipelines
> > * hop-translator: a tool for translating the various parts of the Hop
> tools
> > (i18n).
> > * hop-web: a thin client version of hop-gui for web browsers and mobile
> > devices
> >
> >
> > The cornerstone of the Hop platform is extensibility: all major
> components
> > of the platform are designed to be pluggable. This allows any possible
> > missing functionality to be created in a short amount of time.
> >
> > Background
> > ===
> > The Hop Orchestration Platform has its origins in the Kettle community.
> > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> > in 2015, the community struck out to solve problems less aligned with
> > Hitachi’s interests.
> >
> > Rationale
> > ==
> > In the Hop community, we have always aimed to function as a meritocracy,
> > where contributions are accepted based on merit, and individuals gain
> > status in the community based on their contributions (coding and
> > otherwise). We’re proud to have a diverse 

Re: [DISCUSS] Hop proposal

2020-09-10 Thread Jean-Baptiste Onofre
Hi,

Interesting proposal, and happy to help if needed.

By the way, did you evaluate the potential relationship with Camel or NIFI (and 
what’s the pros/cons if it’s possible to compare with) ?

Regards
JB

> Le 8 sept. 2020 à 11:56, Matt Casters 
>  a écrit :
> 
> Hello Apache,
> 
> Our community is eager to propose for Hop to join the Apache Incubator.
> The Hop Orchestration Platform aims to help people with complex data and
> metadata orchestration problems.
> 
> Below is the complete text of the proposal but you can also find it here:
> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> 
> Any help with respect to the incubation is appreciated including help from
> a few more mentors to set us on the right track.  On behalf of my community
> I'd be happy to answer any questions you might have regarding Hop.  Our
> thanks go out to Max, Julian and Tom for helping us set up this proposal.
> 
> Thanks in advance for your time!
> 
> Best regards,
> 
> Matt - Hop co-founder
> www.project-hop.org
> ---
> 
> Abstract
> =
> Hop is short for the Hop Orchestration Platform. Written completely in Java
> it aims to provide a wide range of data orchestration tools, including a
> visual development environment, servers, metadata analysis, auditing
> services and so on. As a platform Hop also wants to be a re-usable library
> so that it can be easily re-used by other software.
> 
> Proposal
> =
> Hop provides all the tools to build, maintain and deploy data
> orchestration, ETL and data integration solutions. For example, Hop allows
> you to diagram a data flow that propagates changes from a database via
> Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline.
> The core concepts of Hop are Pipelines and Workflows.
> * Pipelines do the core data manipulation work (read, manipulate, write
> data). The main items of work in pipelines are transforms. A pipeline
> consists of two or more (usually many) transforms that each perform a
> granular piece of work. The transforms in a pipeline run in parallel, and
> together create a powerful data processing tool.
> * Workflows take care of the orchestration of actions: execute pipelines,
> run child workflows, environment checks, preparation, problem alerting and
> so on.
> If these terms sound familiar it’s because they are taken from the Apache
> Beam and Apache Airflow projects.
> 
> 
> The main components of the Hop platform are:
> * hop-gui, a visual data orchestration IDE
> * hop-run: a CLI tool to run workflows or pipelines
> * hop-config: a CLI tool to configure Hop and its components
> * hop-server: a light-weight web server to run and monitor workflows and
> pipelines
> * hop-translator: a tool for translating the various parts of the Hop tools
> (i18n).
> * hop-web: a thin client version of hop-gui for web browsers and mobile
> devices
> 
> 
> The cornerstone of the Hop platform is extensibility: all major components
> of the platform are designed to be pluggable. This allows any possible
> missing functionality to be created in a short amount of time.
> 
> Background
> ===
> The Hop Orchestration Platform has its origins in the Kettle community.
> Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> in 2015, the community struck out to solve problems less aligned with
> Hitachi’s interests.
> 
> Rationale
> ==
> In the Hop community, we have always aimed to function as a meritocracy,
> where contributions are accepted based on merit, and individuals gain
> status in the community based on their contributions (coding and
> otherwise). We’re proud to have a diverse group of people doing all the
> required things in a project: development , documentation, tutorials,
> architecture, testing, graphics design and much more. Bringing the project
> under the Apache Software Foundation would allow us to continue and grow,
> but also give our users confidence about the governance, IP status, and
> future of the project.
> 
> ASF Preparation Phase
> ==
> The very first goal of project Hop is to find a good way to cooperate on
> the development across wide geographical, economical and social spectra. To
> make this possible real changes were needed to a codebase which is
> essentially 20 years old. Most of these changes have been tackled by now.
> We think it’s fair to say that by now, Hop is a new platform even though it
> shares a common background as it partly started from the Kettle code base.
> Here are a few of the key focus areas we’re trying to saveguard going
> forward:
> * Plugins: lightweight plugins for all major functionality. This makes it
> possible to extend Hop or reduce Hop in size.  It also allows people to
> implement or change functionality with minimal coding.  In other words it
> makes it easier to contribute.
> * Maintain an open and responsive community where every concern, feedback
> and contribution is welcome.
> * Maintain a clear focus 

Re: [DISCUSS] Hop proposal

2020-09-10 Thread fpapon
Hi,

+1

The project seems to be very interesting and we can see that there is
documentation, contribution guide...

I will be more than happy to help as a mentor.

regards,

François
fpa...@apache.org

Le 10/09/2020 à 13:05, Julian Feinauer a écrit :
> Hey,
>
> thanks for your statement Max and thats already a great start as we coannot 
> expect fresh podlings to know the apache way (at all?) as then there would be 
> no point for the incubator.
> But knowing you and your motivation and reading your statement about the team 
> makes me very confident that this could be a very smooth ride : )
>
> So, best from my side!
>
> Julian
>
> Am 10.09.20, 12:40 schrieb "Maximilian Michels" :
>
> I've met Matt and other folks from the Hop project more than a year ago 
> through Beam Summit Europe. I can say that they are genuinely passionate 
> about open-source. Initially, they were not familiar with the Apache 
> Way, but throughout the past year, everyone has ramped up their 
> knowledge about the ASF. You will also see that reflected in the proposal.
>
> Hop is a great project in the sense that it adds GUI-based integration 
> to many data processing projects at Apache. This is appealing to me 
> because we are leveraging many of the existing projects such as Spark, 
> Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great 
> addition to the Apache project portfolio.
>
> This is going to be my first project as a Champion and I'm very much 
> looking forward to guiding the project throughout the incubation process.
>
> Please post your questions or let us know if you want to help with 
> mentoring the project.
>
> -Max
>
> On 08.09.20 12:30, Matt Casters wrote:
> > Thank you very much Kevin!
> > 
> > On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera 
> 
> > wrote:
> > 
> >> +1 ( binding ) Interesting project. Please add me as a mentor to the
> >> project.
> >>
> >> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
> >>  wrote:
> >>
> >>> Hello Apache,
> >>>
> >>> Our community is eager to propose for Hop to join the Apache 
> Incubator.
> >>> The Hop Orchestration Platform aims to help people with complex data 
> and
> >>> metadata orchestration problems.
> >>>
> >>> Below is the complete text of the proposal but you can also find it 
> here:
> >>> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> >>>
> >>> Any help with respect to the incubation is appreciated including help
> >> from
> >>> a few more mentors to set us on the right track.  On behalf of my
> >> community
> >>> I'd be happy to answer any questions you might have regarding Hop.  
> Our
> >>> thanks go out to Max, Julian and Tom for helping us set up this 
> proposal.
> >>>
> >>> Thanks in advance for your time!
> >>>
> >>> Best regards,
> >>>
> >>> Matt - Hop co-founder
> >>> www.project-hop.org
> >>> ---
> >>>
> >>> Abstract
> >>> =
> >>> Hop is short for the Hop Orchestration Platform. Written completely in
> >> Java
> >>> it aims to provide a wide range of data orchestration tools, 
> including a
> >>> visual development environment, servers, metadata analysis, auditing
> >>> services and so on. As a platform Hop also wants to be a re-usable
> >> library
> >>> so that it can be easily re-used by other software.
> >>>
> >>> Proposal
> >>> =
> >>> Hop provides all the tools to build, maintain and deploy data
> >>> orchestration, ETL and data integration solutions. For example, Hop
> >> allows
> >>> you to diagram a data flow that propagates changes from a database via
> >>> Apache Kafka to a data warehouse and deploy it as an Apache Beam
> >> pipeline.
> >>> The core concepts of Hop are Pipelines and Workflows.
> >>> * Pipelines do the core data manipulation work (read, manipulate, 
> write
> >>> data). The main items of work in pipelines are transforms. A pipeline
> >>> consists of two or more (usually many) transforms that each perform a
> >>> granular piece of work. The transforms in a pipeline run in parallel, 
> and
> >>> together create a powerful data processing tool.
> >>> * Workflows take care of the orchestration of actions: execute 
> pipelines,
> >>> run child workflows, environment checks, preparation, problem alerting
> >> and
> >>> so on.
> >>> If these terms sound familiar it’s because they are taken from the 
> Apache
> >>> Beam and Apache Airflow projects.
> >>>
> >>>
> >>> The main components of the Hop platform are:
> >>> * hop-gui, a visual data orchestration IDE
> >>> * hop-run: a CLI tool to run workflows or pipelines
> >>> * hop-config: a CLI tool to configure Hop and its components
> >>> * hop-server: a light-weight web server to run 

Re: [DISCUSS] Hop proposal

2020-09-10 Thread Julian Feinauer
Hey,

thanks for your statement Max and thats already a great start as we coannot 
expect fresh podlings to know the apache way (at all?) as then there would be 
no point for the incubator.
But knowing you and your motivation and reading your statement about the team 
makes me very confident that this could be a very smooth ride : )

So, best from my side!

Julian

Am 10.09.20, 12:40 schrieb "Maximilian Michels" :

I've met Matt and other folks from the Hop project more than a year ago 
through Beam Summit Europe. I can say that they are genuinely passionate 
about open-source. Initially, they were not familiar with the Apache 
Way, but throughout the past year, everyone has ramped up their 
knowledge about the ASF. You will also see that reflected in the proposal.

Hop is a great project in the sense that it adds GUI-based integration 
to many data processing projects at Apache. This is appealing to me 
because we are leveraging many of the existing projects such as Spark, 
Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great 
addition to the Apache project portfolio.

This is going to be my first project as a Champion and I'm very much 
looking forward to guiding the project throughout the incubation process.

Please post your questions or let us know if you want to help with 
mentoring the project.

-Max

On 08.09.20 12:30, Matt Casters wrote:
> Thank you very much Kevin!
> 
> On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera 

> wrote:
> 
>> +1 ( binding ) Interesting project. Please add me as a mentor to the
>> project.
>>
>> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
>>  wrote:
>>
>>> Hello Apache,
>>>
>>> Our community is eager to propose for Hop to join the Apache Incubator.
>>> The Hop Orchestration Platform aims to help people with complex data and
>>> metadata orchestration problems.
>>>
>>> Below is the complete text of the proposal but you can also find it 
here:
>>> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
>>>
>>> Any help with respect to the incubation is appreciated including help
>> from
>>> a few more mentors to set us on the right track.  On behalf of my
>> community
>>> I'd be happy to answer any questions you might have regarding Hop.  Our
>>> thanks go out to Max, Julian and Tom for helping us set up this 
proposal.
>>>
>>> Thanks in advance for your time!
>>>
>>> Best regards,
>>>
>>> Matt - Hop co-founder
>>> www.project-hop.org
>>> ---
>>>
>>> Abstract
>>> =
>>> Hop is short for the Hop Orchestration Platform. Written completely in
>> Java
>>> it aims to provide a wide range of data orchestration tools, including a
>>> visual development environment, servers, metadata analysis, auditing
>>> services and so on. As a platform Hop also wants to be a re-usable
>> library
>>> so that it can be easily re-used by other software.
>>>
>>> Proposal
>>> =
>>> Hop provides all the tools to build, maintain and deploy data
>>> orchestration, ETL and data integration solutions. For example, Hop
>> allows
>>> you to diagram a data flow that propagates changes from a database via
>>> Apache Kafka to a data warehouse and deploy it as an Apache Beam
>> pipeline.
>>> The core concepts of Hop are Pipelines and Workflows.
>>> * Pipelines do the core data manipulation work (read, manipulate, write
>>> data). The main items of work in pipelines are transforms. A pipeline
>>> consists of two or more (usually many) transforms that each perform a
>>> granular piece of work. The transforms in a pipeline run in parallel, 
and
>>> together create a powerful data processing tool.
>>> * Workflows take care of the orchestration of actions: execute 
pipelines,
>>> run child workflows, environment checks, preparation, problem alerting
>> and
>>> so on.
>>> If these terms sound familiar it’s because they are taken from the 
Apache
>>> Beam and Apache Airflow projects.
>>>
>>>
>>> The main components of the Hop platform are:
>>> * hop-gui, a visual data orchestration IDE
>>> * hop-run: a CLI tool to run workflows or pipelines
>>> * hop-config: a CLI tool to configure Hop and its components
>>> * hop-server: a light-weight web server to run and monitor workflows and
>>> pipelines
>>> * hop-translator: a tool for translating the various parts of the Hop
>> tools
>>> (i18n).
>>> * hop-web: a thin client version of hop-gui for web browsers and mobile
>>> devices
>>>
>>>
>>> The cornerstone of the Hop platform is extensibility: all major
>> components
>>> of the platform are designed to be pluggable. This allows any possible
>>> missing functionality to be created in a 

Re: [DISCUSS] Hop proposal

2020-09-10 Thread Maximilian Michels
I've met Matt and other folks from the Hop project more than a year ago 
through Beam Summit Europe. I can say that they are genuinely passionate 
about open-source. Initially, they were not familiar with the Apache 
Way, but throughout the past year, everyone has ramped up their 
knowledge about the ASF. You will also see that reflected in the proposal.


Hop is a great project in the sense that it adds GUI-based integration 
to many data processing projects at Apache. This is appealing to me 
because we are leveraging many of the existing projects such as Spark, 
Flink, Hadoop, Cassandra, Kafka, etc. The project would be a great 
addition to the Apache project portfolio.


This is going to be my first project as a Champion and I'm very much 
looking forward to guiding the project throughout the incubation process.


Please post your questions or let us know if you want to help with 
mentoring the project.


-Max

On 08.09.20 12:30, Matt Casters wrote:

Thank you very much Kevin!

On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera 
wrote:


+1 ( binding ) Interesting project. Please add me as a mentor to the
project.

On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
 wrote:


Hello Apache,

Our community is eager to propose for Hop to join the Apache Incubator.
The Hop Orchestration Platform aims to help people with complex data and
metadata orchestration problems.

Below is the complete text of the proposal but you can also find it here:
https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal

Any help with respect to the incubation is appreciated including help

from

a few more mentors to set us on the right track.  On behalf of my

community

I'd be happy to answer any questions you might have regarding Hop.  Our
thanks go out to Max, Julian and Tom for helping us set up this proposal.

Thanks in advance for your time!

Best regards,

Matt - Hop co-founder
www.project-hop.org
---

Abstract
=
Hop is short for the Hop Orchestration Platform. Written completely in

Java

it aims to provide a wide range of data orchestration tools, including a
visual development environment, servers, metadata analysis, auditing
services and so on. As a platform Hop also wants to be a re-usable

library

so that it can be easily re-used by other software.

Proposal
=
Hop provides all the tools to build, maintain and deploy data
orchestration, ETL and data integration solutions. For example, Hop

allows

you to diagram a data flow that propagates changes from a database via
Apache Kafka to a data warehouse and deploy it as an Apache Beam

pipeline.

The core concepts of Hop are Pipelines and Workflows.
* Pipelines do the core data manipulation work (read, manipulate, write
data). The main items of work in pipelines are transforms. A pipeline
consists of two or more (usually many) transforms that each perform a
granular piece of work. The transforms in a pipeline run in parallel, and
together create a powerful data processing tool.
* Workflows take care of the orchestration of actions: execute pipelines,
run child workflows, environment checks, preparation, problem alerting

and

so on.
If these terms sound familiar it’s because they are taken from the Apache
Beam and Apache Airflow projects.


The main components of the Hop platform are:
* hop-gui, a visual data orchestration IDE
* hop-run: a CLI tool to run workflows or pipelines
* hop-config: a CLI tool to configure Hop and its components
* hop-server: a light-weight web server to run and monitor workflows and
pipelines
* hop-translator: a tool for translating the various parts of the Hop

tools

(i18n).
* hop-web: a thin client version of hop-gui for web browsers and mobile
devices


The cornerstone of the Hop platform is extensibility: all major

components

of the platform are designed to be pluggable. This allows any possible
missing functionality to be created in a short amount of time.

Background
===
The Hop Orchestration Platform has its origins in the Kettle community.
Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
in 2015, the community struck out to solve problems less aligned with
Hitachi’s interests.

Rationale
==
In the Hop community, we have always aimed to function as a meritocracy,
where contributions are accepted based on merit, and individuals gain
status in the community based on their contributions (coding and
otherwise). We’re proud to have a diverse group of people doing all the
required things in a project: development , documentation, tutorials,
architecture, testing, graphics design and much more. Bringing the

project

under the Apache Software Foundation would allow us to continue and grow,
but also give our users confidence about the governance, IP status, and
future of the project.

ASF Preparation Phase
==
The very first goal of project Hop is to find a good way to cooperate on
the development across wide geographical, economical and social spectra.

To

make 

Re: [DISCUSS] Hop proposal

2020-09-08 Thread Matt Casters
Thank you very much Kevin!

On Tue, Sep 8, 2020 at 12:07 PM Kevin Ratnasekera 
wrote:

> +1 ( binding ) Interesting project. Please add me as a mentor to the
> project.
>
> On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
>  wrote:
>
> > Hello Apache,
> >
> > Our community is eager to propose for Hop to join the Apache Incubator.
> > The Hop Orchestration Platform aims to help people with complex data and
> > metadata orchestration problems.
> >
> > Below is the complete text of the proposal but you can also find it here:
> > https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
> >
> > Any help with respect to the incubation is appreciated including help
> from
> > a few more mentors to set us on the right track.  On behalf of my
> community
> > I'd be happy to answer any questions you might have regarding Hop.  Our
> > thanks go out to Max, Julian and Tom for helping us set up this proposal.
> >
> > Thanks in advance for your time!
> >
> > Best regards,
> >
> > Matt - Hop co-founder
> > www.project-hop.org
> > ---
> >
> > Abstract
> > =
> > Hop is short for the Hop Orchestration Platform. Written completely in
> Java
> > it aims to provide a wide range of data orchestration tools, including a
> > visual development environment, servers, metadata analysis, auditing
> > services and so on. As a platform Hop also wants to be a re-usable
> library
> > so that it can be easily re-used by other software.
> >
> > Proposal
> > =
> > Hop provides all the tools to build, maintain and deploy data
> > orchestration, ETL and data integration solutions. For example, Hop
> allows
> > you to diagram a data flow that propagates changes from a database via
> > Apache Kafka to a data warehouse and deploy it as an Apache Beam
> pipeline.
> > The core concepts of Hop are Pipelines and Workflows.
> > * Pipelines do the core data manipulation work (read, manipulate, write
> > data). The main items of work in pipelines are transforms. A pipeline
> > consists of two or more (usually many) transforms that each perform a
> > granular piece of work. The transforms in a pipeline run in parallel, and
> > together create a powerful data processing tool.
> > * Workflows take care of the orchestration of actions: execute pipelines,
> > run child workflows, environment checks, preparation, problem alerting
> and
> > so on.
> > If these terms sound familiar it’s because they are taken from the Apache
> > Beam and Apache Airflow projects.
> >
> >
> > The main components of the Hop platform are:
> > * hop-gui, a visual data orchestration IDE
> > * hop-run: a CLI tool to run workflows or pipelines
> > * hop-config: a CLI tool to configure Hop and its components
> > * hop-server: a light-weight web server to run and monitor workflows and
> > pipelines
> > * hop-translator: a tool for translating the various parts of the Hop
> tools
> > (i18n).
> > * hop-web: a thin client version of hop-gui for web browsers and mobile
> > devices
> >
> >
> > The cornerstone of the Hop platform is extensibility: all major
> components
> > of the platform are designed to be pluggable. This allows any possible
> > missing functionality to be created in a short amount of time.
> >
> > Background
> > ===
> > The Hop Orchestration Platform has its origins in the Kettle community.
> > Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> > in 2015, the community struck out to solve problems less aligned with
> > Hitachi’s interests.
> >
> > Rationale
> > ==
> > In the Hop community, we have always aimed to function as a meritocracy,
> > where contributions are accepted based on merit, and individuals gain
> > status in the community based on their contributions (coding and
> > otherwise). We’re proud to have a diverse group of people doing all the
> > required things in a project: development , documentation, tutorials,
> > architecture, testing, graphics design and much more. Bringing the
> project
> > under the Apache Software Foundation would allow us to continue and grow,
> > but also give our users confidence about the governance, IP status, and
> > future of the project.
> >
> > ASF Preparation Phase
> > ==
> > The very first goal of project Hop is to find a good way to cooperate on
> > the development across wide geographical, economical and social spectra.
> To
> > make this possible real changes were needed to a codebase which is
> > essentially 20 years old. Most of these changes have been tackled by now.
> > We think it’s fair to say that by now, Hop is a new platform even though
> it
> > shares a common background as it partly started from the Kettle code
> base.
> > Here are a few of the key focus areas we’re trying to saveguard going
> > forward:
> > * Plugins: lightweight plugins for all major functionality. This makes it
> > possible to extend Hop or reduce Hop in size.  It also allows people to
> > implement or change functionality with minimal coding.  In other words 

Re: [DISCUSS] Hop proposal

2020-09-08 Thread Kevin Ratnasekera
+1 ( binding ) Interesting project. Please add me as a mentor to the
project.

On Tue, Sep 8, 2020 at 3:26 PM Matt Casters
 wrote:

> Hello Apache,
>
> Our community is eager to propose for Hop to join the Apache Incubator.
> The Hop Orchestration Platform aims to help people with complex data and
> metadata orchestration problems.
>
> Below is the complete text of the proposal but you can also find it here:
> https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal
>
> Any help with respect to the incubation is appreciated including help from
> a few more mentors to set us on the right track.  On behalf of my community
> I'd be happy to answer any questions you might have regarding Hop.  Our
> thanks go out to Max, Julian and Tom for helping us set up this proposal.
>
> Thanks in advance for your time!
>
> Best regards,
>
> Matt - Hop co-founder
> www.project-hop.org
> ---
>
> Abstract
> =
> Hop is short for the Hop Orchestration Platform. Written completely in Java
> it aims to provide a wide range of data orchestration tools, including a
> visual development environment, servers, metadata analysis, auditing
> services and so on. As a platform Hop also wants to be a re-usable library
> so that it can be easily re-used by other software.
>
> Proposal
> =
> Hop provides all the tools to build, maintain and deploy data
> orchestration, ETL and data integration solutions. For example, Hop allows
> you to diagram a data flow that propagates changes from a database via
> Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline.
> The core concepts of Hop are Pipelines and Workflows.
> * Pipelines do the core data manipulation work (read, manipulate, write
> data). The main items of work in pipelines are transforms. A pipeline
> consists of two or more (usually many) transforms that each perform a
> granular piece of work. The transforms in a pipeline run in parallel, and
> together create a powerful data processing tool.
> * Workflows take care of the orchestration of actions: execute pipelines,
> run child workflows, environment checks, preparation, problem alerting and
> so on.
> If these terms sound familiar it’s because they are taken from the Apache
> Beam and Apache Airflow projects.
>
>
> The main components of the Hop platform are:
> * hop-gui, a visual data orchestration IDE
> * hop-run: a CLI tool to run workflows or pipelines
> * hop-config: a CLI tool to configure Hop and its components
> * hop-server: a light-weight web server to run and monitor workflows and
> pipelines
> * hop-translator: a tool for translating the various parts of the Hop tools
> (i18n).
> * hop-web: a thin client version of hop-gui for web browsers and mobile
> devices
>
>
> The cornerstone of the Hop platform is extensibility: all major components
> of the platform are designed to be pluggable. This allows any possible
> missing functionality to be created in a short amount of time.
>
> Background
> ===
> The Hop Orchestration Platform has its origins in the Kettle community.
> Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
> in 2015, the community struck out to solve problems less aligned with
> Hitachi’s interests.
>
> Rationale
> ==
> In the Hop community, we have always aimed to function as a meritocracy,
> where contributions are accepted based on merit, and individuals gain
> status in the community based on their contributions (coding and
> otherwise). We’re proud to have a diverse group of people doing all the
> required things in a project: development , documentation, tutorials,
> architecture, testing, graphics design and much more. Bringing the project
> under the Apache Software Foundation would allow us to continue and grow,
> but also give our users confidence about the governance, IP status, and
> future of the project.
>
> ASF Preparation Phase
> ==
> The very first goal of project Hop is to find a good way to cooperate on
> the development across wide geographical, economical and social spectra. To
> make this possible real changes were needed to a codebase which is
> essentially 20 years old. Most of these changes have been tackled by now.
> We think it’s fair to say that by now, Hop is a new platform even though it
> shares a common background as it partly started from the Kettle code base.
> Here are a few of the key focus areas we’re trying to saveguard going
> forward:
> * Plugins: lightweight plugins for all major functionality. This makes it
> possible to extend Hop or reduce Hop in size.  It also allows people to
> implement or change functionality with minimal coding.  In other words it
> makes it easier to contribute.
> * Maintain an open and responsive community where every concern, feedback
> and contribution is welcome.
> * Maintain a clear focus on data orchestration user requirements, not on
> “industry trends”
> * Documentation: we set up a version controlled “adoc” system with
> 

[DISCUSS] Hop proposal

2020-09-08 Thread Matt Casters
Hello Apache,

Our community is eager to propose for Hop to join the Apache Incubator.
The Hop Orchestration Platform aims to help people with complex data and
metadata orchestration problems.

Below is the complete text of the proposal but you can also find it here:
https://cwiki.apache.org/confluence/display/INCUBATOR/HopProposal

Any help with respect to the incubation is appreciated including help from
a few more mentors to set us on the right track.  On behalf of my community
I'd be happy to answer any questions you might have regarding Hop.  Our
thanks go out to Max, Julian and Tom for helping us set up this proposal.

Thanks in advance for your time!

Best regards,

Matt - Hop co-founder
www.project-hop.org
---

Abstract
=
Hop is short for the Hop Orchestration Platform. Written completely in Java
it aims to provide a wide range of data orchestration tools, including a
visual development environment, servers, metadata analysis, auditing
services and so on. As a platform Hop also wants to be a re-usable library
so that it can be easily re-used by other software.

Proposal
=
Hop provides all the tools to build, maintain and deploy data
orchestration, ETL and data integration solutions. For example, Hop allows
you to diagram a data flow that propagates changes from a database via
Apache Kafka to a data warehouse and deploy it as an Apache Beam pipeline.
The core concepts of Hop are Pipelines and Workflows.
* Pipelines do the core data manipulation work (read, manipulate, write
data). The main items of work in pipelines are transforms. A pipeline
consists of two or more (usually many) transforms that each perform a
granular piece of work. The transforms in a pipeline run in parallel, and
together create a powerful data processing tool.
* Workflows take care of the orchestration of actions: execute pipelines,
run child workflows, environment checks, preparation, problem alerting and
so on.
If these terms sound familiar it’s because they are taken from the Apache
Beam and Apache Airflow projects.


The main components of the Hop platform are:
* hop-gui, a visual data orchestration IDE
* hop-run: a CLI tool to run workflows or pipelines
* hop-config: a CLI tool to configure Hop and its components
* hop-server: a light-weight web server to run and monitor workflows and
pipelines
* hop-translator: a tool for translating the various parts of the Hop tools
(i18n).
* hop-web: a thin client version of hop-gui for web browsers and mobile
devices


The cornerstone of the Hop platform is extensibility: all major components
of the platform are designed to be pluggable. This allows any possible
missing functionality to be created in a short amount of time.

Background
===
The Hop Orchestration Platform has its origins in the Kettle community.
Kettle got acquired by Pentaho and after Pentaho’s acquisition by Hitachi
in 2015, the community struck out to solve problems less aligned with
Hitachi’s interests.

Rationale
==
In the Hop community, we have always aimed to function as a meritocracy,
where contributions are accepted based on merit, and individuals gain
status in the community based on their contributions (coding and
otherwise). We’re proud to have a diverse group of people doing all the
required things in a project: development , documentation, tutorials,
architecture, testing, graphics design and much more. Bringing the project
under the Apache Software Foundation would allow us to continue and grow,
but also give our users confidence about the governance, IP status, and
future of the project.

ASF Preparation Phase
==
The very first goal of project Hop is to find a good way to cooperate on
the development across wide geographical, economical and social spectra. To
make this possible real changes were needed to a codebase which is
essentially 20 years old. Most of these changes have been tackled by now.
We think it’s fair to say that by now, Hop is a new platform even though it
shares a common background as it partly started from the Kettle code base.
Here are a few of the key focus areas we’re trying to saveguard going
forward:
* Plugins: lightweight plugins for all major functionality. This makes it
possible to extend Hop or reduce Hop in size.  It also allows people to
implement or change functionality with minimal coding.  In other words it
makes it easier to contribute.
* Maintain an open and responsive community where every concern, feedback
and contribution is welcome.
* Maintain a clear focus on data orchestration user requirements, not on
“industry trends”
* Documentation: we set up a version controlled “adoc” system with
automated builds which is both open, controlled and reviewed.  This is
incredibly important for every Hop user and developer.
* Testing and stability: we want to massively increase stability by
implementing integration tests beyond the standard Java unit testing
because of the dynamic nature of data orchestration work.  We still