[discuss] Zeppelin support workflow

2019-03-10 Thread Xun Liu
Hello, everyone

Because there are more than 20 interpreters in zeppelin,  Data analysts can be 
used to do a variety of data development, 
A lot of data development is interdependent. 
For example, the development of machine learning algorithms requires relying on 
spark to preprocess data, and so on.

Zeppelin should have built-in workflow capabilities. Instead of relying on 
external software to schedule notes in zeppelin for the following reasons:

1. Now that we have upgraded from the data processing era to the algorithm era, 
After zeppelin has its own workflow, 
Will have a complete ecosystem of complete data processing and algorithmic 
operations.
2. zeppelin's powerful interactive processing capabilities help algorithm 
engineers improve productivity and work. 
Zeppelin should give the algorithm engineer more direct control. Instead of 
handing the algorithm to other teams(or software) to do the workflow.
3. zeppelin knows more about the processing status of data than Azkaban and 
airflow. 
So the built-in workflow will have better performance, user experience and 
control.

Typical use case
Especially in machine learning, Because machine learning generally has a long 
task execution.
A typical example is as follows:
1) First, obtain data from HDFS through spark;
2) Clean and convert the data through sparksql;
3) Feature extraction of data through spark;
4) Tensorflow writing algorithm through hadoop submarine;
5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch 
processing;
6) Publish the training acquisition model and provide online prediction 
services;
7) Model prediction by flink;
8) Receive incremental data through flink for incremental update of the model;
Therefore, zeppelin is especially required to have the ability to arrange 
workflows.

I completed the draft of the zeppelin workflow system design, please review, 
you can directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 
 
gdoc: 
https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
 

 

:-)

Xun Liu
2019-03-11

Re: [discuss] Zeppelin support workflow

2019-03-11 Thread Jongyoul Lee
Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [discuss] Zeppelin support workflow

2019-03-16 Thread Felix Cheung
I like it!


From: Jongyoul Lee 
Sent: Monday, March 11, 2019 9:05:03 PM
To: dev
Subject: Re: [discuss] Zeppelin support workflow

Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [discuss] Zeppelin support workflow

2019-03-18 Thread Mei Long
Very cool! @Xun Liu Would you like to talk about it at our next Apache
Zeppelin community meeting?

On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung 
wrote:

> I like it!
>
> 
> From: Jongyoul Lee 
> Sent: Monday, March 11, 2019 9:05:03 PM
> To: dev
> Subject: Re: [discuss] Zeppelin support workflow
>
> Thanks for the sharing this kind of discussion.
>
> I'm interested in it. Will see it.
>
> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:
>
> > Hello, everyone
> >
> > Because there are more than 20 interpreters in zeppelin,  Data analysts
> > can be used to do a variety of data development,
> > A lot of data development is interdependent.
> > For example, the development of machine learning algorithms requires
> > relying on spark to preprocess data, and so on.
> >
> > Zeppelin should have built-in workflow capabilities. Instead of relying
> on
> > external software to schedule notes in zeppelin for the following
> reasons:
> >
> > 1. Now that we have upgraded from the data processing era to the
> algorithm
> > era, After zeppelin has its own workflow,
> > Will have a complete ecosystem of complete data processing and
> algorithmic
> > operations.
> > 2. zeppelin's powerful interactive processing capabilities help algorithm
> > engineers improve productivity and work.
> > Zeppelin should give the algorithm engineer more direct control. Instead
> > of handing the algorithm to other teams(or software) to do the workflow.
> > 3. zeppelin knows more about the processing status of data than Azkaban
> > and airflow.
> > So the built-in workflow will have better performance, user experience
> and
> > control.
> >
> > Typical use case
> > Especially in machine learning, Because machine learning generally has a
> > long task execution.
> > A typical example is as follows:
> > 1) First, obtain data from HDFS through spark;
> > 2) Clean and convert the data through sparksql;
> > 3) Feature extraction of data through spark;
> > 4) Tensorflow writing algorithm through hadoop submarine;
> > 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> > processing;
> > 6) Publish the training acquisition model and provide online prediction
> > services;
> > 7) Model prediction by flink;
> > 8) Receive incremental data through flink for incremental update of the
> > model;
> > Therefore, zeppelin is especially required to have the ability to arrange
> > workflows.
> >
> > I completed the draft of the zeppelin workflow system design, please
> > review, you can directly modify the document or fill in the comments.
> >
> > JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> > https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> > gdoc:
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> > <
> >
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> >
> >
> >
> > :-)
> >
> > Xun Liu
> > 2019-03-11
>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


Re: [discuss] Zeppelin support workflow

2019-03-18 Thread Xun Liu
Hi, Mei Long

I am very happy to be able to attend the meeting of the zeppelin community. 
What time is the next meeting? Waiting for community email notifications?

Zeppelin workflow's ticket is here, 
https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
welcome everyone's attention.

> 在 2019年3月19日,上午1:04,Mei Long  写道:
> 
> Very cool! @Xun Liu Would you like to talk about it at our next Apache
> Zeppelin community meeting?
> 
> On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung 
> wrote:
> 
>> I like it!
>> 
>> 
>> From: Jongyoul Lee 
>> Sent: Monday, March 11, 2019 9:05:03 PM
>> To: dev
>> Subject: Re: [discuss] Zeppelin support workflow
>> 
>> Thanks for the sharing this kind of discussion.
>> 
>> I'm interested in it. Will see it.
>> 
>> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:
>> 
>>> Hello, everyone
>>> 
>>> Because there are more than 20 interpreters in zeppelin,  Data analysts
>>> can be used to do a variety of data development,
>>> A lot of data development is interdependent.
>>> For example, the development of machine learning algorithms requires
>>> relying on spark to preprocess data, and so on.
>>> 
>>> Zeppelin should have built-in workflow capabilities. Instead of relying
>> on
>>> external software to schedule notes in zeppelin for the following
>> reasons:
>>> 
>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm
>>> era, After zeppelin has its own workflow,
>>> Will have a complete ecosystem of complete data processing and
>> algorithmic
>>> operations.
>>> 2. zeppelin's powerful interactive processing capabilities help algorithm
>>> engineers improve productivity and work.
>>> Zeppelin should give the algorithm engineer more direct control. Instead
>>> of handing the algorithm to other teams(or software) to do the workflow.
>>> 3. zeppelin knows more about the processing status of data than Azkaban
>>> and airflow.
>>> So the built-in workflow will have better performance, user experience
>> and
>>> control.
>>> 
>>> Typical use case
>>> Especially in machine learning, Because machine learning generally has a
>>> long task execution.
>>> A typical example is as follows:
>>> 1) First, obtain data from HDFS through spark;
>>> 2) Clean and convert the data through sparksql;
>>> 3) Feature extraction of data through spark;
>>> 4) Tensorflow writing algorithm through hadoop submarine;
>>> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
>>> processing;
>>> 6) Publish the training acquisition model and provide online prediction
>>> services;
>>> 7) Model prediction by flink;
>>> 8) Receive incremental data through flink for incremental update of the
>>> model;
>>> Therefore, zeppelin is especially required to have the ability to arrange
>>> workflows.
>>> 
>>> I completed the draft of the zeppelin workflow system design, please
>>> review, you can directly modify the document or fill in the comments.
>>> 
>>> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>> gdoc:
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> <
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> 
>>> 
>>> 
>>> :-)
>>> 
>>> Xun Liu
>>> 2019-03-11
>> 
>> 
>> 
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>