Hi Xun,

Thanks for your work - could you change the title of the email, I think you 
will get more attention to your ask to review the design.


________________________________
From: Xun Liu <neliu...@163.com>
Sent: Sunday, March 10, 2019 12:03 AM
To: Jongyoul Lee; m...@apache.org; Jeff Zhang; Vasiliy Morkovkin
Cc: dev@zeppelin.apache.org
Subject: Re: Zeppelin in GSOC 2019

Hello, everyone,

I have completed the zeppelin workflow system design, please review, you can 
directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 
<https://issues.apache.org/jira/browse/ZEPPELIN-4018>
gdoc: 
https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#
 
<https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#>

:-)

> 在 2019年3月8日,下午2:10,Jeff Zhang <zjf...@gmail.com> 写道:
>
> Hi Liu,
>
> See this link https://community.apache.org/gsoc.html
>
>
> Xun Liu <neliu...@163.com> 于2019年3月8日周五 下午1:58写道:
>
>> Hi, Jongyoul Lee, Морковкин
>>
>> I queried the information about GSOS. Is it still necessary to apply for
>> the zeppelin community first?
>> I don't know much about GSOS. In addition to helping the project, the
>> mentor
>> What other work needs to be done?
>>
>>> 在 2019年3月8日,上午10:01,Xun Liu <neliu...@163.com> 写道:
>>>
>>> Hi, Морковкин
>>>
>>> I am very happy to be your mentor for GSOC. :-)
>>> I believe that by completing this work, I can also learn a lot.
>>>
>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>>
>>>> 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>> morkovkin...@phystech.edu> 写道:
>>>>
>>>> Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>> It makes it easy to impose dependencies on the execution order of tasks.
>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>> the flow which is shown in the attached picture.
>>>> Xun Liu, It would be great to clarify whether you agree to be a mentor
>> exactly within GSOC, or without it? :)
>>>>
>>>> ----------------------------------------
>>>> Best regards, Basil Morkovkin
>>>>
>>>> чт, 7 мар. 2019 г. в 11:32, Jeff Zhang <zjf...@gmail.com <mailto:
>> zjf...@gmail.com>>:
>>>>
>>>> Thanks Liu for taking over this, I will help review the design.
>>>>
>>>> Xun Liu <neliu...@163.com <mailto:neliu...@163.com>> 于2019年3月7日周四
>> 下午4:05写道:
>>>> Hi Vasiliy Morkovkin
>>>>
>>>> Thank you very much for your willingness to implement this feature of
>> workflow.
>>>> I will work with you with the highest priority.
>>>> I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
>>>> Please set the Watcher in ZEPPELIN-4018.
>>>> This way you can get notification messages for document updates in a
>> timely manner.
>>>>
>>>> We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
>>>> If you need it, you can email me at liuxun...@gmail.com <mailto:
>> liuxun...@gmail.com> <mailto:liuxun...@gmail.com <mailto:
>> liuxun...@gmail.com>> , I will reply you the fastest.
>>>> Do you think this kind of cooperation is OK?
>>>>
>>>>
>>>> @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!
>>>>
>>>> :-)
>>>>
>>>>> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>> 写道:
>>>>>
>>>>> Thank you for such a detailed feedback!
>>>>> I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
>>>>> Some front-end work is not a problem at all.
>>>>> I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>>>>>
>>>>> ср, 6 мар. 2019 г. в 05:23, Xun Liu <neliu...@163.com <mailto:
>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>:
>>>>> Hi Vasiliy Morkovkin
>>>>>
>>>>> I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>
>>>>> Because there are more than 20 interpreters in zeppelin,
>>>>> Data analysts can be used to do a variety of data development,
>>>>> A lot of data development is interdependent. For example,
>>>>> the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>>>>>
>>>>> Now open source workflow software has Azkaban, airflow,
>>>>> Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company is using it.
>>>>> Airflow looks complicated and I have not used it.
>>>>> In fact, I have previously implemented workflow workflow for notes and
>> paragraphs in zeppelin via azkaban.
>>>>> https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>
>> <https://youtu.be/2r6q-2Tq7hk?t=33 <https://youtu.be/2r6q-2Tq7hk?t=33>>
>>>>>
>>>>> However, I think zeppelin should have built-in workflow capabilities.
>>>>> Instead of relying on external software to schedule notes in zeppelin
>> for the following reasons:
>>>>> 1. Now that we have upgraded from the data processing era to the
>> algorithm era,
>>>>> After zeppelin has its own workflow, it will form a data loop.
>>>>>
>>>>> 2. zeppelin's powerful interactive processing capabilities help
>> algorithm engineers improve productivity and work.
>>>>> Zeppelin should give the algorithm engineer more direct control.
>>>>> Instead of handing the algorithm to other teams(or software) to do the
>> workflow.
>>>>>
>>>>> 3. zeppelin knows more about the processing status of data than
>> Azkaban and airflow.
>>>>> So the built-in workflow will have better performance, user experience
>> and control.
>>>>>
>>>>> If you are interested in workflow(ZEPPELIN-4018),
>>>>> I am willing to work with you to complete all system design and code
>> development work.
>>>>>
>>>>> :-)
>>>>>
>>>>>> 在 2019年3月6日,上午9:32,Jeff Zhang <zjf...@gmail.com <mailto:
>> zjf...@gmail.com> <mailto:zjf...@gmail.com <mailto:zjf...@gmail.com>>> 写道:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857Hi>> Basil,
>>>>>>
>>>>>> Thanks for your interest in zeppelin, here's my comments about the
>> tickets
>>>>>> you interested.
>>>>>>
>>>>>> 1. https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>>
>>>>>> This involves 2 sides of work: frontend and backend:
>>>>>> In frontend, we should use arrow js to handle the table data,
>> include
>>>>>> display it and processing it (such as aggregation)
>>>>>> In backend, we should use arrow for each language, and allow them to
>>>>>> exchange data in the same process. And use arrow IPC to exchange data
>>>>>> across processes.
>>>>>> Overall, this is a pretty large task. If you really want to do, I
>> would
>>>>>> suggest you to just take part of it.
>>>>>>
>>>>>> 2. https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>>
>>>>>> Regarding model serving, I don't have clear picture about this.
>> Others
>>>>>> can comment on this.
>>>>>>
>>>>>> 3. https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>>>>>> Job scheduling is pretty important for zeppelin, I would make this
>> as
>>>>>> the highest priority for zeppelin among these tickets. airflow is one
>>>>>> option, but I am open to other solutions. First we need to figure out
>> how
>>>>>> user schedule jobs in zeppelin, then choose the right framework. It
>> would
>>>>>> also involves some frontend work
>>>>>>
>>>>>> 4. https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>
>>>>>> Spark 2.4.0 supporting is already there, but scala 2.12 is not
>>>>>> supported yet. It won't be a big project for GSOC IMO.
>>>>>>
>>>>>> 5. OLAP.
>>>>>> Regarding OLAP, as long as the OLAP engine provide Jdbc interface,
>>>>>> Zeppelin can support it very well. But we could create specific
>> interpreter
>>>>>> for OLAP engine if their native api perform better than jdbc. Another
>> thing
>>>>>> I can think of improving OLAP is visualization, although Zeppelin
>> already
>>>>>> support some built-in visualization, there's still some visualization
>>>>>> missing. We could provide more.
>>>>>>
>>>>>> 6. Auto-completions.
>>>>>> We have already support ipython[1] in zeppelin which provide almost
>> the
>>>>>> same auto-completion like jupyter. But it lacks for accessing python
>> api
>>>>>> doc. This is also pretty important for python users IMO. SQL is
>> another
>>>>>> popular language in Zeppelin, but it also doesn't provide good
>>>>>> code-completion experience, we can do better as well.
>>>>>>
>>>>>> 7. Notifications.
>>>>>> I think notification can be integrated into job scheduling.
>> Notification
>>>>>> can be sent when job is failed/succeed.
>>>>>>
>>>>>>
>>>>>> Let us know which jira you are more interested, and also please
>> consider
>>>>>> how much time you can spent on this. Again, we are very appreciated
>> your
>>>>>> interest on zeppelin and look forward your contribution.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>>
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support>
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>> <
>> http://zeppelin.apache.org/docs/0.8.1/interpreter/python.html#ipython-support
>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Морковкин, Василий Владимирович <morkovkin...@phystech.edu <mailto:
>> morkovkin...@phystech.edu> <mailto:morkovkin...@phystech.edu <mailto:
>> morkovkin...@phystech.edu>>> 于2019年3月6日周三
>>>>>> 上午7:41写道:
>>>>>>
>>>>>>> Thank you for your replies! I've checked existing set of issues and
>> found
>>>>>>> several curious ones:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3651>> seems to be very
>>>>>>> nice
>>>>>>> way to increase analytical processing performance using Arrow
>> project;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3994>> deploying models
>>>>>>> regardless of ZeppelinServer sounds quite intriguing too. Although
>> there is
>>>>>>> much to think about;
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> at first glance
>>>>>>> https://airflow.apache.org/ <https://airflow.apache.org/> <
>> https://airflow.apache.org/ <https://airflow.apache.org/>> seems to be
>> useful in implementing complex
>>>>>>> execution workflows.
>>>>>>> Those tasks are global and intriguing, requiring complex
>> architectural
>>>>>>> solutions.
>>>>>>> Also I've probably found the ticket which is suitable for me to get
>>>>>>> involved into the project:
>>>>>>> - https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-3857>>. What do you think?
>>>>>>> Are there any "low hanging fruits"?
>>>>>>>
>>>>>>> And I have several ideas on my own. Some of them might be not
>> relevant due
>>>>>>> to the vision of the project or other reasons. Just ideas:
>>>>>>> - OLAP. As Zeppelin is a tool aimed at analytics, it seems to be
>> quite
>>>>>>> logical to add more integrations with existing OLAP solutions like
>> Pinot,
>>>>>>> ClickHouse and Druid. Currently I've found integration only with
>> Kylin;
>>>>>>> - Better autocompletion. Jupyter offers not only a list of already
>>>>>>> initialized variables, but also quick access to documentation. It's
>>>>>>> convenient;
>>>>>>> - Notifications. Some colleagues would have appreciated the
>> notifications
>>>>>>> service, which sends you messages (via mail, Slack bot or something
>> else)
>>>>>>> indicating that your long-running paragraphs has completed.
>>>>>>>
>>>>>>> Feedback is very appreciated :)
>>>>>>>
>>>>>>> It would be wonderful if someone agreed to sacrifice his time and
>> become a
>>>>>>> mentor in GSOC program!
>>>>>>>
>>>>>>> ----------------------------------------
>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>
>>>>>>>
>>>>>>> вт, 5 мар. 2019 г. в 11:48, Jongyoul Lee <jongy...@gmail.com
>> <mailto:jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:
>> jongy...@gmail.com>>>:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I've confirmed I could add more issues for GSOC. Can you explain
>> what you
>>>>>>>> would like to contribute to? I can add more issues
>>>>>>>>
>>>>>>>> JL
>>>>>>>>
>>>>>>>> On Tue, Mar 5, 2019 at 1:03 PM Xun Liu <neliu...@163.com <mailto:
>> neliu...@163.com> <mailto:neliu...@163.com <mailto:neliu...@163.com>>>
>> wrote:
>>>>>>>>
>>>>>>>>> Hi, Vasiliy Morkovkin
>>>>>>>>>
>>>>>>>>> Welcome to the zeppelin community! :-)
>>>>>>>>>
>>>>>>>>>> 在 2019年3月5日,上午11:49,Jongyoul Lee <jongy...@gmail.com <mailto:
>> jongy...@gmail.com> <mailto:jongy...@gmail.com <mailto:jongy...@gmail.com>>>
>> 写道:
>>>>>>>>>>
>>>>>>>>>> Thanks for contacting Zeppelin with your interest.
>>>>>>>>>>
>>>>>>>>>> I added FE topics for GSOC because FE is the most urgent issue I
>> have
>>>>>>>>>> thought about. We always encourage to contribute Zeppelin with
>> several
>>>>>>>>>> topics including your idea.
>>>>>>>>>>
>>>>>>>>>> Please describe something more.
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>> JL
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 5, 2019 at 10:41 AM moon soo Lee <m...@apache.org
>> <mailto:m...@apache.org> <mailto:m...@apache.org <mailto:m...@apache.org>>>
>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Great to see your interest to project. Thanks!
>>>>>>>>>>> Looks like we need volunteers for a mentor and some backend
>> subject
>>>>>>> for
>>>>>>>>>>> GSoC2019.
>>>>>>>>>>> Any ideas?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> moon
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 4, 2019 at 3:05 PM Vasiliy Morkovkin <
>>>>>>>>>>> morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>
>> <mailto:morkovkin...@phystech.edu <mailto:morkovkin...@phystech.edu>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone, I'm pursuing bachelor degree at Moscow institute of
>>>>>>>>> physics
>>>>>>>>>>>> and technology and eager to contribute to Zeppelin in context of
>>>>>>> GSOC
>>>>>>>>>>>> 2019. I've become a real fan of Zeppelin over the past couple of
>>>>>>>>> months,
>>>>>>>>>>>> using it at my job. But I have found out only one ticket
>> (front-end
>>>>>>>>>>>> task) with label of GSOC 2019 on your Jira. Perhaps you may
>> have any
>>>>>>>>>>>> ideas for new features or improvements in Zeppelin, but you
>> don't
>>>>>>> have
>>>>>>>>>>>> enough hands on them. It would be wonderful if anyone agreed to
>>>>>>> mentor
>>>>>>>>>>>> these ideas within GSOC :)
>>>>>>>>>>>> Currently I am in a position of Scala developer (back-end) for
>> 1.5
>>>>>>>>> year.
>>>>>>>>>>>> I also can write in Java or Python without any problems if
>>>>>>> necessary.
>>>>>>>>>>>> Really fond of databases and highload. Also I have experience
>> with
>>>>>>>>> some
>>>>>>>>>>>> other great Apache projects like Cassandra, Kafka and Spark.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards, Basil Morkovkin.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> 이종열, Jongyoul Lee, 李宗烈
>>>>>>>> http://madeng.net <http://madeng.net/> <http://madeng.net/ <
>> http://madeng.net/>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards
>>>>>>
>>>>>> Jeff Zhang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards
>>>>
>>>> Jeff Zhang
>>>
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang

Reply via email to