Re: [discuss] Zeppelin support workflow

Xun Liu Mon, 18 Mar 2019 19:38:58 -0700

Hi, Mei Long

I am very happy to be able to attend the meeting of the zeppelin community. 
What time is the next meeting? Waiting for community email notifications?


Zeppelin workflow's ticket is here, 
https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018> 
welcome everyone's attention.

> 在 2019年3月19日，上午1:04，Mei Long <[email protected]> 写道：
> 
> Very cool! @Xun Liu Would you like to talk about it at our next Apache
> Zeppelin community meeting?
> 
> On Sat, Mar 16, 2019 at 1:00 PM Felix Cheung <[email protected]>
> wrote:
> 
>> I like it!
>> 
>> ________________________________
>> From: Jongyoul Lee <[email protected]>
>> Sent: Monday, March 11, 2019 9:05:03 PM
>> To: dev
>> Subject: Re: [discuss] Zeppelin support workflow
>> 
>> Thanks for the sharing this kind of discussion.
>> 
>> I'm interested in it. Will see it.
>> 
>> On Mon, Mar 11, 2019 at 10:43 AM Xun Liu <[email protected]> wrote:
>> 
>>> Hello, everyone
>>> 
>>> Because there are more than 20 interpreters in zeppelin,  Data analysts
>>> can be used to do a variety of data development,
>>> A lot of data development is interdependent.
>>> For example, the development of machine learning algorithms requires
>>> relying on spark to preprocess data, and so on.
>>> 
>>> Zeppelin should have built-in workflow capabilities. Instead of relying
>> on
>>> external software to schedule notes in zeppelin for the following
>> reasons:
>>> 
>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm
>>> era, After zeppelin has its own workflow,
>>> Will have a complete ecosystem of complete data processing and
>> algorithmic
>>> operations.
>>> 2. zeppelin's powerful interactive processing capabilities help algorithm
>>> engineers improve productivity and work.
>>> Zeppelin should give the algorithm engineer more direct control. Instead
>>> of handing the algorithm to other teams(or software) to do the workflow.
>>> 3. zeppelin knows more about the processing status of data than Azkaban
>>> and airflow.
>>> So the built-in workflow will have better performance, user experience
>> and
>>> control.
>>> 
>>> Typical use case
>>> Especially in machine learning, Because machine learning generally has a
>>> long task execution.
>>> A typical example is as follows:
>>> 1) First, obtain data from HDFS through spark;
>>> 2) Clean and convert the data through sparksql;
>>> 3) Feature extraction of data through spark;
>>> 4) Tensorflow writing algorithm through hadoop submarine;
>>> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
>>> processing;
>>> 6) Publish the training acquisition model and provide online prediction
>>> services;
>>> 7) Model prediction by flink;
>>> 8) Receive incremental data through flink for incremental update of the
>>> model;
>>> Therefore, zeppelin is especially required to have the ability to arrange
>>> workflows.
>>> 
>>> I completed the draft of the zeppelin workflow system design, please
>>> review, you can directly modify the document or fill in the comments.
>>> 
>>> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>> gdoc:
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> <
>>> 
>> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
>>> 
>>> 
>>> 
>>> :-)
>>> 
>>> Xun Liu
>>> 2019-03-11
>> 
>> 
>> 
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>

Re: [discuss] Zeppelin support workflow

Reply via email to