[ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread jincheng sun
Hi all,

On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now part
of the Apache Flink Project Management Committee (PMC).

Dian Fu has been very active on PyFlink component, working on various
important features, such as the Python UDF and Pandas integration, and
keeps checking and voting for our releases, and also has successfully
produced two releases(1.9.3&1.11.1) as RM, currently working as RM to push
forward the release of Flink 1.12.

Please join me in congratulating Dian Fu for becoming a Flink PMC Member!

Best,
Jincheng(on behalf of the Flink PMC)


Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-10 Thread jincheng sun
Thank you for your positive feedback Seth !
Would you please vote in the voting mail thread. Thank you!

Best,
Jincheng


Seth Wiesman  于2020年8月10日周一 下午10:34写道:

> I think this sounds good. +1
>
> On Wed, Aug 5, 2020 at 8:37 PM jincheng sun 
> wrote:
>
>> Hi David, Thank you for sharing the problems with the current document,
>> and I agree with you as I also got the same feedback from Chinese users. I
>> am often contacted by users to ask questions such as whether PyFlink
>> supports "Java UDF" and whether PyFlink supports "xxxConnector". The root
>> cause of these problems is that our existing documents are based on Java
>> users (text and API mixed part). Since Python is newly added from 1.9, many
>> document information is not friendly to Python users. They don't want to
>> look for Python content in unfamiliar Java documents. Just yesterday, there
>> were complaints from Chinese users about where is all the document entries
>> of  Python API. So, have a centralized entry and clear document structure,
>> which is the urgent demand of Python users. The original intention of FLIP
>> is do our best to solve these user pain points.
>>
>> Hi Xingbo and Wei Thank you for sharing PySpark's status on document
>> optimization. You're right. PySpark already has a lot of Python user
>> groups. They also find that Python user community is an important position
>> for multilingual support. The centralization and unification of Python
>> document content will reduce the learning cost of Python users, and good
>> document structure and content will also reduce the Q & A burden of the
>> community, It's a once and for all job.
>>
>> Hi Seth, I wonder if your concerns have been resolved through the
>> previous discussion?
>>
>> Anyway, the principle of FLIP is that in python document should only
>> include Python specific content, instead of making a copy of the Java
>> content. And would be great to have you to join in the improvement for
>> PyFlink (Both PRs and Review PRs).
>>
>> Best,
>> Jincheng
>>
>>
>> Wei Zhong  于2020年8月5日周三 下午5:46写道:
>>
>>> Hi Xingbo,
>>>
>>> Thanks for your information.
>>>
>>> I think the PySpark's documentation redesigning deserves our attention.
>>> It seems that the Spark community has also begun to treat the user
>>> experience of Python documentation more seriously. We can continue to pay
>>> attention to the discussion and progress of the redesigning in the Spark
>>> community. It is so similar to our working that there should be some ideas
>>> worthy for us.
>>>
>>> Best,
>>> Wei
>>>
>>>
>>> 在 2020年8月5日,15:02,Xingbo Huang  写道:
>>>
>>> Hi,
>>>
>>> I found that the spark community is also working on redesigning pyspark
>>> documentation[1] recently. Maybe we can compare the difference between our
>>> document structure and its document structure.
>>>
>>> [1] https://issues.apache.org/jira/browse/SPARK-31851
>>>
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>>>
>>> Best,
>>> Xingbo
>>>
>>> David Anderson  于2020年8月5日周三 上午3:17写道:
>>>
>>>> I'm delighted to see energy going into improving the documentation.
>>>>
>>>> With the current documentation, I get a lot of questions that I believe
>>>> reflect two fundamental problems with what we currently provide:
>>>>
>>>> (1) We have a lot of contextual information in our heads about how
>>>> Flink works, and we are able to use that knowledge to make reasonable
>>>> inferences about how things (probably) work in cases we aren't so familiar
>>>> with. For example, I get a lot of questions of the form "If I use >>> feature> will I still have exactly once guarantees?" The answer is always
>>>> yes, but they continue to have doubts because we have failed to clearly
>>>> communicate this fundamental, underlying principle.
>>>>
>>>> This specific example about fault tolerance applies across all of the
>>>> Flink docs, but the general idea can also be applied to the Table/SQL and
>>>> PyFlink docs. The guiding principles underlying these APIs should be
>>>> written down in one easy-to-find place.
>>>>
>>>> (2) The other kind of question I get a lot is "Can I do  wi

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-05 Thread jincheng sun
Hi David, Thank you for sharing the problems with the current document, and
I agree with you as I also got the same feedback from Chinese users. I am
often contacted by users to ask questions such as whether PyFlink supports
"Java UDF" and whether PyFlink supports "xxxConnector". The root cause of
these problems is that our existing documents are based on Java users (text
and API mixed part). Since Python is newly added from 1.9, many document
information is not friendly to Python users. They don't want to look for
Python content in unfamiliar Java documents. Just yesterday, there were
complaints from Chinese users about where is all the document entries of
 Python API. So, have a centralized entry and clear document structure,
which is the urgent demand of Python users. The original intention of FLIP
is do our best to solve these user pain points.

Hi Xingbo and Wei Thank you for sharing PySpark's status on document
optimization. You're right. PySpark already has a lot of Python user
groups. They also find that Python user community is an important position
for multilingual support. The centralization and unification of Python
document content will reduce the learning cost of Python users, and good
document structure and content will also reduce the Q & A burden of the
community, It's a once and for all job.

Hi Seth, I wonder if your concerns have been resolved through the previous
discussion?

Anyway, the principle of FLIP is that in python document should only
include Python specific content, instead of making a copy of the Java
content. And would be great to have you to join in the improvement for
PyFlink (Both PRs and Review PRs).

Best,
Jincheng


Wei Zhong  于2020年8月5日周三 下午5:46写道:

> Hi Xingbo,
>
> Thanks for your information.
>
> I think the PySpark's documentation redesigning deserves our attention. It
> seems that the Spark community has also begun to treat the user experience
> of Python documentation more seriously. We can continue to pay attention to
> the discussion and progress of the redesigning in the Spark community. It
> is so similar to our working that there should be some ideas worthy for us.
>
> Best,
> Wei
>
>
> 在 2020年8月5日,15:02,Xingbo Huang  写道:
>
> Hi,
>
> I found that the spark community is also working on redesigning pyspark
> documentation[1] recently. Maybe we can compare the difference between our
> document structure and its document structure.
>
> [1] https://issues.apache.org/jira/browse/SPARK-31851
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Need-some-help-and-contributions-in-PySpark-API-documentation-td29972.html
>
> Best,
> Xingbo
>
> David Anderson  于2020年8月5日周三 上午3:17写道:
>
>> I'm delighted to see energy going into improving the documentation.
>>
>> With the current documentation, I get a lot of questions that I believe
>> reflect two fundamental problems with what we currently provide:
>>
>> (1) We have a lot of contextual information in our heads about how Flink
>> works, and we are able to use that knowledge to make reasonable inferences
>> about how things (probably) work in cases we aren't so familiar with. For
>> example, I get a lot of questions of the form "If I use  will
>> I still have exactly once guarantees?" The answer is always yes, but they
>> continue to have doubts because we have failed to clearly communicate this
>> fundamental, underlying principle.
>>
>> This specific example about fault tolerance applies across all of the
>> Flink docs, but the general idea can also be applied to the Table/SQL and
>> PyFlink docs. The guiding principles underlying these APIs should be
>> written down in one easy-to-find place.
>>
>> (2) The other kind of question I get a lot is "Can I do  with ?"
>> E.g., "Can I use the JDBC table sink from PyFlink?" These questions can be
>> very difficult to answer because it is frequently the case that one has to
>> reason about why a given feature doesn't seem to appear in the
>> documentation. It could be that I'm looking in the wrong place, or it could
>> be that someone forgot to document something, or it could be that it can in
>> fact be done by applying a general mechanism in a specific way that I
>> haven't thought of -- as in this case, where one can use a JDBC sink from
>> Python if one thinks to use DDL.
>>
>> So I think it would be helpful to be explicit about both what is, and
>> what is not, supported in PyFlink. And to have some very clear organizing
>> principles in the documentation so that users can quickly learn where to
>> look for specific facts.
>>
>> Regards,
>> David
>>
>>

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-04 Thread jincheng sun
t; are more independent, it will be challenging to respond to questions about
> which features from the other APIs are available from Python.
>
> David
>
> On Mon, Aug 3, 2020 at 8:07 AM jincheng sun 
> wrote:
>
>> Would be great if you could join the contribution of PyFlink
>> documentation @Marta !
>> Thanks for all of the positive feedback. I will start a formal vote then
>> later...
>>
>> Best,
>> Jincheng
>>
>>
>> Shuiqiang Chen  于2020年8月3日周一 上午9:56写道:
>>
>> > Hi jincheng,
>> >
>> > Thanks for the discussion. +1 for the FLIP.
>> >
>> > A well-organized documentation will greatly improve the efficiency and
>> > experience for developers.
>> >
>> > Best,
>> > Shuiqiang
>> >
>> > Hequn Cheng  于2020年8月1日周六 上午8:42写道:
>> >
>> >> Hi Jincheng,
>> >>
>> >> Thanks a lot for raising the discussion. +1 for the FLIP.
>> >>
>> >> I think this will bring big benefits for the PyFlink users. Currently,
>> >> the Python TableAPI document is hidden deeply under the TableAPI&SQL
>> tab
>> >> which makes it quite unreadable. Also, the PyFlink documentation is
>> mixed
>> >> with Java/Scala documentation. It is hard for users to have an
>> overview of
>> >> all the PyFlink documents. As more and more functionalities are added
>> into
>> >> PyFlink, I think it's time for us to refactor the document.
>> >>
>> >> Best,
>> >> Hequn
>> >>
>> >>
>> >> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira <
>> ma...@ververica.com>
>> >> wrote:
>> >>
>> >>> Hi, Jincheng!
>> >>>
>> >>> Thanks for creating this detailed FLIP, it will make a big difference
>> in
>> >>> the experience of Python developers using Flink. I'm interested in
>> >>> contributing to this work, so I'll reach out to you offline!
>> >>>
>> >>> Also, thanks for sharing some information on the adoption of PyFlink,
>> >>> it's
>> >>> great to see that there are already production users.
>> >>>
>> >>> Marta
>> >>>
>> >>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang 
>> wrote:
>> >>>
>> >>> > Hi Jincheng,
>> >>> >
>> >>> > Thanks a lot for bringing up this discussion and the proposal.
>> >>> >
>> >>> > Big +1 for improving the structure of PyFlink doc.
>> >>> >
>> >>> > It will be very friendly to give PyFlink users a unified entrance to
>> >>> learn
>> >>> > PyFlink documents.
>> >>> >
>> >>> > Best,
>> >>> > Xingbo
>> >>> >
>> >>> > Dian Fu  于2020年7月31日周五 上午11:00写道:
>> >>> >
>> >>> >> Hi Jincheng,
>> >>> >>
>> >>> >> Thanks a lot for bringing up this discussion and the proposal. +1
>> to
>> >>> >> improve the Python API doc.
>> >>> >>
>> >>> >> I have received many feedbacks from PyFlink beginners about
>> >>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>> >>> mixed
>> >>> >> with the Java doc and it's not easy to find the docs he wants to
>> know.
>> >>> >>
>> >>> >> I think it would greatly improve the user experience if we can have
>> >>> one
>> >>> >> place which includes most knowledges PyFlink users should know.
>> >>> >>
>> >>> >> Regards,
>> >>> >> Dian
>> >>> >>
>> >>> >> 在 2020年7月31日,上午10:14,jincheng sun  写道:
>> >>> >>
>> >>> >> Hi folks,
>> >>> >>
>> >>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>> >>> grow.
>> >>> >> As far as I know there are many companies have used PyFlink for
>> data
>> >>> >> analysis, operation and maintenance monitoring business has been
>> put
>> >>> into
>> >>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).
>> According
>> >>> to
>> &

Re: [DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-08-02 Thread jincheng sun
Would be great if you could join the contribution of PyFlink
documentation @Marta !
Thanks for all of the positive feedback. I will start a formal vote then
later...

Best,
Jincheng


Shuiqiang Chen  于2020年8月3日周一 上午9:56写道:

> Hi jincheng,
>
> Thanks for the discussion. +1 for the FLIP.
>
> A well-organized documentation will greatly improve the efficiency and
> experience for developers.
>
> Best,
> Shuiqiang
>
> Hequn Cheng  于2020年8月1日周六 上午8:42写道:
>
>> Hi Jincheng,
>>
>> Thanks a lot for raising the discussion. +1 for the FLIP.
>>
>> I think this will bring big benefits for the PyFlink users. Currently,
>> the Python TableAPI document is hidden deeply under the TableAPI&SQL tab
>> which makes it quite unreadable. Also, the PyFlink documentation is mixed
>> with Java/Scala documentation. It is hard for users to have an overview of
>> all the PyFlink documents. As more and more functionalities are added into
>> PyFlink, I think it's time for us to refactor the document.
>>
>> Best,
>> Hequn
>>
>>
>> On Fri, Jul 31, 2020 at 3:43 PM Marta Paes Moreira 
>> wrote:
>>
>>> Hi, Jincheng!
>>>
>>> Thanks for creating this detailed FLIP, it will make a big difference in
>>> the experience of Python developers using Flink. I'm interested in
>>> contributing to this work, so I'll reach out to you offline!
>>>
>>> Also, thanks for sharing some information on the adoption of PyFlink,
>>> it's
>>> great to see that there are already production users.
>>>
>>> Marta
>>>
>>> On Fri, Jul 31, 2020 at 5:35 AM Xingbo Huang  wrote:
>>>
>>> > Hi Jincheng,
>>> >
>>> > Thanks a lot for bringing up this discussion and the proposal.
>>> >
>>> > Big +1 for improving the structure of PyFlink doc.
>>> >
>>> > It will be very friendly to give PyFlink users a unified entrance to
>>> learn
>>> > PyFlink documents.
>>> >
>>> > Best,
>>> > Xingbo
>>> >
>>> > Dian Fu  于2020年7月31日周五 上午11:00写道:
>>> >
>>> >> Hi Jincheng,
>>> >>
>>> >> Thanks a lot for bringing up this discussion and the proposal. +1 to
>>> >> improve the Python API doc.
>>> >>
>>> >> I have received many feedbacks from PyFlink beginners about
>>> >> the PyFlink doc, e.g. the materials are too few, the Python doc is
>>> mixed
>>> >> with the Java doc and it's not easy to find the docs he wants to know.
>>> >>
>>> >> I think it would greatly improve the user experience if we can have
>>> one
>>> >> place which includes most knowledges PyFlink users should know.
>>> >>
>>> >> Regards,
>>> >> Dian
>>> >>
>>> >> 在 2020年7月31日,上午10:14,jincheng sun  写道:
>>> >>
>>> >> Hi folks,
>>> >>
>>> >> Since the release of Flink 1.11, users of PyFlink have continued to
>>> grow.
>>> >> As far as I know there are many companies have used PyFlink for data
>>> >> analysis, operation and maintenance monitoring business has been put
>>> into
>>> >> production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According
>>> to
>>> >> the feedback we received, current documentation is not very friendly
>>> to
>>> >> PyFlink users. There are two shortcomings:
>>> >>
>>> >> - Python related content is mixed in the Java/Scala documentation,
>>> which
>>> >> makes it difficult for users who only focus on PyFlink to read.
>>> >> - There is already a "Python Table API" section in the Table API
>>> document
>>> >> to store PyFlink documents, but the number of articles is small and
>>> the
>>> >> content is fragmented. It is difficult for beginners to learn from it.
>>> >>
>>> >> In addition, FLIP-130 introduced the Python DataStream API. Many
>>> >> documents will be added for those new APIs. In order to increase the
>>> >> readability and maintainability of the PyFlink document, Wei Zhong
>>> and me
>>> >> have discussed offline and would like to rework it via this FLIP.
>>> >>
>>> >> We will rework the document around the following three objectives:
>>> >>
>>> >> - Add a separate section for Python API under the "Application
>>> >> Development" section.
>>> >> - Restructure current Python documentation to a brand new structure to
>>> >> ensure complete content and friendly to beginners.
>>> >> - Improve the documents shared by Python/Java/Scala to make it more
>>> >> friendly to Python users and without affecting Java/Scala users.
>>> >>
>>> >> More detail can be found in the FLIP-133:
>>> >>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation
>>> >>
>>> >> Best,
>>> >> Jincheng
>>> >>
>>> >> [1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
>>> >> [2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g
>>> >>
>>> >>
>>> >>
>>>
>>


[DISCUSS] FLIP-133: Rework PyFlink Documentation

2020-07-30 Thread jincheng sun
Hi folks,

Since the release of Flink 1.11, users of PyFlink have continued to grow.
As far as I know there are many companies have used PyFlink for data
analysis, operation and maintenance monitoring business has been put into
production(Such as 聚美优品[1](Jumei),  浙江墨芷[2] (Mozhi) etc.).  According to
the feedback we received, current documentation is not very friendly to
PyFlink users. There are two shortcomings:

- Python related content is mixed in the Java/Scala documentation, which
makes it difficult for users who only focus on PyFlink to read.
- There is already a "Python Table API" section in the Table API document
to store PyFlink documents, but the number of articles is small and the
content is fragmented. It is difficult for beginners to learn from it.

In addition, FLIP-130 introduced the Python DataStream API. Many documents
will be added for those new APIs. In order to increase the readability and
maintainability of the PyFlink document, Wei Zhong and me have discussed
offline and would like to rework it via this FLIP.

We will rework the document around the following three objectives:

- Add a separate section for Python API under the "Application Development"
section.
- Restructure current Python documentation to a brand new structure to
ensure complete content and friendly to beginners.
- Improve the documents shared by Python/Java/Scala to make it more
friendly to Python users and without affecting Java/Scala users.

More detail can be found in the FLIP-133:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-133%3A+Rework+PyFlink+Documentation

Best,
Jincheng

[1] https://mp.weixin.qq.com/s/zVsBIs1ZEFe4atYUYtZpRg
[2] https://mp.weixin.qq.com/s/R4p_a2TWGpESBWr3pLtM2g


Re: pyflink的table api 能否根据 filed的值进行判断插入到不同的sink表中

2020-06-21 Thread jincheng sun
您好,jack:

Table API  不用 if/else 直接用类似逻辑即可:

val t1 = table.filter('x  > 2).groupBy(..)
val t2 = table.filter('x <= 2).groupBy(..)
t1.insert_into("sink1)
t2.insert_into("sink2")


Best,
Jincheng



jack  于2020年6月19日周五 上午10:35写道:

>
> 测试使用如下结构:
> table= t_env.from_path("source")
>
> if table.filter("logType=syslog"):
> table.filter("logType=syslog").insert_into("sink1")
> elif table.filter("logType=alarm"):
> table.filter("logType=alarm").insert_into("sink2")
>
>
> 我测试了下,好像table
> .filter("logType=syslog").insert_into("sink1")生效,下面的elif不生效,原因是
> table.filter("logType=syslog")或者table.where在做条件判断的同时已经将数据进行过滤,走不到下面的分支??
>
>
>
>
> 在 2020-06-19 10:08:25,"jack"  写道:
> >使用pyflink的table api 能否根据 filed的值进行判断插入到不同的sink表中?
> >
> >
> >场景:使用pyflink通过filter进行条件过滤后插入到sink中,
> >比如以下两条消息,logType不同,可以使用filter接口进行过滤后插入到sink表中:
> >{
> >"logType":"syslog",
> >"message":"sla;flkdsjf"
> >}
> >{
> >"logType":"alarm",
> >"message":"sla;flkdsjf"
> >}
> >  t_env.from_path("source")\
> >  .filter("logType=syslog")\
> >  .insert_into("sink1")
> >有方法直接在上面的代码中通过判断logType字段的类型实现类似if else的逻辑吗:
> >if logType=="syslog":
> >   insert_into(sink1)
> >elif logType=="alarm":
> >   insert_into(sink2)
> >
> >
> >如果insert_into 有.filter .select等接口的返回值的话就好办了,可以接着往下通过filter进行判断,代码类似以下:
> >
> >
> >  t_env.from_path("source")\
> >  .filter("logType=syslog")\
> >  .insert_into("sink1")\
> >  .filter("logType=alarm")\
> >  .insert_into("sink2")
> >请各位大牛指点,感谢
> >
> >
> >
> >
> >
>
>


[ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread jincheng sun
Hi all,

On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
part of the Apache Flink Project Management Committee (PMC).

Yu Li has been very active on Flink's Statebackend component, working on
various improvements, for example the RocksDB memory management for 1.10.
and keeps checking and voting for our releases, and also has successfully
produced two releases(1.10.0&1.10.1) as RM.

Congratulations & Welcome Yu Li!

Best,
Jincheng (on behalf of the Flink PMC)


Re: pyflink数据查询

2020-06-15 Thread jincheng sun
你好 Jack,

>  pyflink 从source通过sql对数据进行查询聚合等操作 不输出到sink中,而是可以直接作为结果,
我这边可以通过开发web接口直接查询这个结果,不必去sink中进行查询

我理解你上面说的 【直接作为结果】+ 【web接口查询】已经包含了“sink”的动作。只是这个“sink” 是这样的实现而已。对于您的场景:
1. 如果您想直接将结果不落地(不存储)执行推送的 web页面,可以自定义一个Web Socket的Sink。
2. 如果您不是想直接推送到web页面,而是通过查询拉取结果,那么您上面说的
【直接作为结果】这句话就要描述一下,您想怎样作为结果?我理解是要落盘的(持久化),所以这里持久化本质也是一个sink。Flink可以支持很多中sink,比如:数据库,文件系统,消息队列等等。您可以参考官方文档:
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html

如果上面回复 没有解决你的问题,欢迎随时反馈~~

Best,
Jincheng



Jeff Zhang  于2020年6月9日周二 下午5:39写道:

> 可以用zeppelin的z.show 来查询job结果。这边有pyflink在zeppelin上的入门教程
> https://www.bilibili.com/video/BV1Te411W73b?p=20
> 可以加入钉钉群讨论:30022475
>
>
>
> jack  于2020年6月9日周二 下午5:28写道:
>
>> 问题请教:
>> 描述: pyflink 从source通过sql对数据进行查询聚合等操作
>> 不输出到sink中,而是可以直接作为结果,我这边可以通过开发web接口直接查询这个结果,不必去sink中进行查询。
>>
>> flink能否实现这样的方式?
>> 感谢
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: Python UDF from Java

2020-04-30 Thread jincheng sun
Thanks Flavio  and Thanks Marta,

That's a good question  as many user want to know that!

CC to user-zh mailing list :)

Best,
Jincheng
-
Twitter: https://twitter.com/sunjincheng121
-


Flavio Pompermaier  于2020年5月1日周五 上午7:04写道:

> Yes, that's awesome! I think this would be a really attractive feature to
> promote the usage of Flink.
>
> Thanks Marta,
> Flavio
>
> On Fri, May 1, 2020 at 12:26 AM Marta Paes Moreira 
> wrote:
>
>> Hi, Flavio.
>>
>> Extending the scope of Python UDFs is described in FLIP-106 [1, 2] and is
>> planned for the upcoming 1.11 release, according to Piotr's last update.
>>
>> Hope this addresses your question!
>>
>> Marta
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-106-Support-Python-UDF-in-SQL-Function-DDL-td38107.html
>>
>> On Thu, Apr 30, 2020 at 11:30 PM Flavio Pompermaier 
>> wrote:
>>
>>> Hi to all,
>>> is it possible to run a Python UDF from a Java job (using Table API or
>>> SQL)?
>>> Is there any reference?
>>>
>>> Best,
>>> Flavio
>>>
>>
>


Re: [ANNOUNCE] Apache Flink 1.9.3 released

2020-04-25 Thread jincheng sun
Thanks for your great job, Dian!

Best,
Jincheng


Hequn Cheng  于2020年4月25日周六 下午8:30写道:

> @Dian, thanks a lot for the release and for being the release manager.
> Also thanks to everyone who made this release possible!
>
> Best,
> Hequn
>
> On Sat, Apr 25, 2020 at 7:57 PM Dian Fu  wrote:
>
>> Hi everyone,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.3, which is the third bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2020/04/24/release-1.9.3.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346867
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>> Also great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Dian
>>
>


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread jincheng sun
Congratulations Jingsong!

Best,
Jincheng


Zhu Zhu  于2020年2月24日周一 上午11:55写道:

> Congratulations Jingsong!
>
> Thanks,
> Zhu Zhu
>
> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>
>> Congrats Jingsong!
>>
>> Cheers, Fabian
>>
>> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong > >:
>>
>> > Congratulations Jingsong!!
>> >
>> > Cheers,
>> > Rong
>> >
>> > On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>> >
>> > > Congrats, Jingsong!
>> > >
>> > > On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
>> > > wrote:
>> > >
>> > >> Congratulations Jingsong!
>> > >>
>> > >> Cheers,
>> > >> Till
>> > >>
>> > >> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
>> wrote:
>> > >>
>> > >>>   Congratulations Jingsong!
>> > >>>
>> > >>>Best,
>> > >>>Yun
>> > >>>
>> > >>> --
>> > >>> From:Jingsong Li 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 21:42
>> > >>> To:Hequn Cheng 
>> > >>> Cc:Yang Wang ; Zhijiang <
>> > >>> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
>> godfrey
>> > >>> he ; dev ; user <
>> > >>> user@flink.apache.org>
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Thanks everyone~
>> > >>>
>> > >>> It's my pleasure to be part of the community. I hope I can make a
>> > better
>> > >>> contribution in future.
>> > >>>
>> > >>> Best,
>> > >>> Jingsong Lee
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
>> wrote:
>> > >>> Congratulations Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
>> > wrote:
>> > >>> Congratulations!Jingsong. Well deserved.
>> > >>>
>> > >>>
>> > >>> Best,
>> > >>> Yang
>> > >>>
>> > >>> Zhijiang  于2020年2月21日周五 下午1:18写道:
>> > >>> Congrats Jingsong! Welcome on board!
>> > >>>
>> > >>> Best,
>> > >>> Zhijiang
>> > >>>
>> > >>> --
>> > >>> From:Zhenghua Gao 
>> > >>> Send Time:2020 Feb. 21 (Fri.) 12:49
>> > >>> To:godfrey he 
>> > >>> Cc:dev ; user 
>> > >>> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>> > >>>
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>>
>> > >>> *Best Regards,*
>> > >>> *Zhenghua Gao*
>> > >>>
>> > >>>
>> > >>> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
>> > wrote:
>> > >>> Congrats Jingsong! Well deserved.
>> > >>>
>> > >>> Best,
>> > >>> godfrey
>> > >>>
>> > >>> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
>> > >>> Congratulations!Jingsong. You deserve it
>> > >>>
>> > >>> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
>> > >>> Congrats Jingsong!
>> > >>>
>> > >>> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
>> wrote:
>> > >>>
>> > >>> > Congrats Jingsong!
>> > >>> >
>> > >>> > > 在 2020年2月21日,上午11:39,Jark Wu  写道:
>> > >>> > >
>> > >>> > > Congratulations Jingsong! Well deserved.
>> > >>> > >
>> > >>> > > Best,
>> > >>> > > Jark
>> > >>> > >
>> > >>> > > On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>> > >>> > >
>> > >>> > >> Congratulations! Jingsong
>> > >>> > >>
>> > >>> > >>
>> > >>> > >> Best,
>> > >>> > >> Dan Zou
>> > >>> > >>
>> > >>> >
>> > >>> >
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best Regards
>> > >>>
>> > >>> Jeff Zhang
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Best, Jingsong Lee
>> > >>>
>> > >>>
>> > >>>
>> >
>>
>


[ANNOUNCE] Apache Flink Python API(PyFlink) 1.9.2 released

2020-02-12 Thread jincheng sun
Hi everyone,

The Apache Flink community is very happy to announce the release of Apache
Flink Python API(PyFlink) 1.9.2, which is the first release to PyPI for the
Apache Flink Python API 1.9 series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:

https://pypi.org/project/apache-flink/1.9.2/#files

Or installed using pip command:

pip install apache-flink==1.9.2

We would like to thank all contributors of the Apache Flink community who
helped to verify this release and made this release possible!

Best,
Jincheng


Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-12 Thread jincheng sun
Hi folks,

Thanks everyone for voting. I'm closing the vote now and will post the
result as a separate email.

Best,
Jincheng


Xingbo Huang  于2020年2月13日周四 上午9:28写道:

> +1 (non-binding)
>
> - Install the PyFlink by `pip install` [SUCCESS]
> - Run word_count.py [SUCCESS]
>
> Thanks,
> Xingbo
>
> Becket Qin  于2020年2月12日周三 下午2:28写道:
>
>> +1 (binding)
>>
>> - verified signature
>> - Ran word count example successfully.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, Feb 12, 2020 at 1:29 PM Jark Wu  wrote:
>>
>>> +1
>>>
>>> - checked/verified signatures and hashes
>>> - Pip installed the package successfully: pip install
>>> apache-flink-1.9.2.tar.gz
>>> - Run word count example successfully through the documentation [1].
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>>
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html
>>>
>>> On Tue, 11 Feb 2020 at 22:00, Hequn Cheng  wrote:
>>>
>>> > +1 (non-binding)
>>> >
>>> > - Check signature and checksum.
>>> > - Install package successfully with Pip under Python 3.7.4.
>>> > - Run wordcount example successfully under Python 3.7.4.
>>> >
>>> > Best, Hequn
>>> >
>>> > On Tue, Feb 11, 2020 at 12:17 PM Dian Fu 
>>> wrote:
>>> >
>>> > > +1 (non-binding)
>>> > >
>>> > > - Verified the signature and checksum
>>> > > - Pip installed the package successfully: pip install
>>> > > apache-flink-1.9.2.tar.gz
>>> > > - Run word count example successfully.
>>> > >
>>> > > Regards,
>>> > > Dian
>>> > >
>>> > > 在 2020年2月11日,上午11:44,jincheng sun  写道:
>>> > >
>>> > >
>>> > > +1 (binding)
>>> > >
>>> > > - Install the PyFlink by `pip install` [SUCCESS]
>>> > > - Run word_count in both command line and IDE [SUCCESS]
>>> > >
>>> > > Best,
>>> > > Jincheng
>>> > >
>>> > >
>>> > >
>>> > > Wei Zhong  于2020年2月11日周二 上午11:17写道:
>>> > >
>>> > >> Hi,
>>> > >>
>>> > >> Thanks for driving this, Jincheng.
>>> > >>
>>> > >> +1 (non-binding)
>>> > >>
>>> > >> - Verified signatures and checksums.
>>> > >> - Verified README.md and setup.py.
>>> > >> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and
>>> > Python
>>> > >> 3.7.5 successfully.
>>> > >> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via
>>> > >> `pyflink-shell.sh local` and try the examples in the help message,
>>> run
>>> > well
>>> > >> and no exception.
>>> > >> - Try a word count example in IDE with Python 2.7.15 and Python
>>> 3.7.5,
>>> > >> run well and no exception.
>>> > >>
>>> > >> Best,
>>> > >> Wei
>>> > >>
>>> > >>
>>> > >> 在 2020年2月10日,19:12,jincheng sun  写道:
>>> > >>
>>> > >> Hi everyone,
>>> > >>
>>> > >> Please review and vote on the release candidate #1 for the PyFlink
>>> > >> version 1.9.2, as follows:
>>> > >>
>>> > >> [ ] +1, Approve the release
>>> > >> [ ] -1, Do not approve the release (please provide specific
>>> comments)
>>> > >>
>>> > >> The complete staging area is available for your review, which
>>> includes:
>>> > >>
>>> > >> * the official Apache binary convenience releases to be deployed to
>>> > >> dist.apache.org [1], which are signed with the key with fingerprint
>>> > >> 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source
>>> code
>>> > [3].
>>> > >>
>>> > >> The vote will be open for at least 72 hours. It is adopted by
>>> majority
>>> > >> approval, with at least 3 PMC affirmative votes.
>>> > >>
>>> > >> Thanks,
>>> > >> Jincheng
>>> > >>
>>> > >> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
>>> > >> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
>>> > >> [3] https://github.com/apache/flink/tree/release-1.9.2
>>> > >>
>>> > >>
>>> > >
>>> >
>>>
>>


Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
+1 (binding)

- Install the PyFlink by `pip install` [SUCCESS]
- Run word_count in both command line and IDE [SUCCESS]

Best,
Jincheng



Wei Zhong  于2020年2月11日周二 上午11:17写道:

> Hi,
>
> Thanks for driving this, Jincheng.
>
> +1 (non-binding)
>
> - Verified signatures and checksums.
> - Verified README.md and setup.py.
> - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python
> 3.7.5 successfully.
> - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via
> `pyflink-shell.sh local` and try the examples in the help message, run well
> and no exception.
> - Try a word count example in IDE with Python 2.7.15 and Python 3.7.5, run
> well and no exception.
>
> Best,
> Wei
>
>
> 在 2020年2月10日,19:12,jincheng sun  写道:
>
> Hi everyone,
>
> Please review and vote on the release candidate #1 for the PyFlink version
> 1.9.2, as follows:
>
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
> The complete staging area is available for your review, which includes:
>
> * the official Apache binary convenience releases to be deployed to
> dist.apache.org [1], which are signed with the key with fingerprint
> 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source code [3].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Jincheng
>
> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> [3] https://github.com/apache/flink/tree/release-1.9.2
>
>
>


[VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
Hi everyone,

Please review and vote on the release candidate #1 for the PyFlink version
1.9.2, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:

* the official Apache binary convenience releases to be deployed to
dist.apache.org [1], which are signed with the key with fingerprint
8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2] and built from source code [3].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Jincheng

[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc1/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://github.com/apache/flink/tree/release-1.9.2


[DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-03 Thread jincheng sun
Hi folks,

I am very happy to receive some user inquiries about the use of Flink
Python API (PyFlink) recently. One of the more common questions is whether
it is possible to install PyFlink without using source code build. The most
convenient and natural way for users is to use `pip install apache-flink`.
We originally planned to support the use of `pip install apache-flink` in
Flink 1.10, but the reason for this decision was that when Flink 1.9 was
released at August 22, 2019[1], Flink's PyPI account system was not ready.
At present, our PyPI account is available at October 09, 2019 [2](Only PMC
can access), So for the convenience of users I propose:

- Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
- Update Flink 1.9 documentation to add support for `pip install`.

As we all know, Flink 1.9.2 was just completed released at January 31, 2020
[3]. There is still at least 1 to 2 months before the release of 1.9.3, so
my proposal is completely considered from the perspective of user
convenience. Although the proposed work is not large, we have not set a
precedent for independent release of the Flink Python API(PyFlink) in the
previous release process. I hereby initiate the current discussion and look
forward to your feedback!

Best,
Jincheng

[1]
https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
[2]
https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
[3]
https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E


Re: [ANNOUNCE] Apache Flink 1.9.2 released

2020-01-31 Thread jincheng sun
Thanks for being the release manager and the great work Hequn :)

Also thanks to the community making this release possible!

BTW: I have add the 1.9.2 release to report.

Best,
Jincheng

Hequn Cheng  于2020年1月31日周五 下午6:55写道:

> Hi everyone,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.9.2, which is the second bugfix release for the Apache Flink 1.9
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2020/01/30/release-1.9.2.html
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12346278
>
> We would like to thank all contributors of the Apache Flink community who
> helped to verify this release and made this release possible!
> Great thanks to @Jincheng for helping finalize this release.
>
> Regards,
> Hequn
>
>


Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread jincheng sun
Congrats Dian Fu and welcome on board!

Best,
Jincheng

Shuo Cheng  于2020年1月16日周四 下午6:22写道:

> Congratulations!  Dian Fu
>
> > Xingbo Wei Zhong  于2020年1月16日周四 下午6:13写道: >> jincheng sun
> 于2020年1月16日周四 下午5:58写道:
>


[ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread jincheng sun
Hi everyone,

I'm very happy to announce that Dian accepted the offer of the Flink PMC to
become a committer of the Flink project.

Dian Fu has been contributing to Flink for many years. Dian Fu played an
essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
contributed several major features, reported and fixed many bugs, spent a
lot of time reviewing pull requests and also frequently helping out on the
user mailing lists and check/vote the release.

Please join in me congratulating Dian for becoming a Flink committer !

Best,
Jincheng(on behalf of the Flink PMC)


Re: [DISCUSS] What parts of the Python API should we focus on next ?

2019-12-22 Thread jincheng sun
Hi Bowen,

Your suggestions are very helpful for expanding the PyFlink ecology.  I
also mentioned above to integrate notebooks,Jupyter and Zeppelin are both
very excellent notebooks. The process of integrating Jupyter and Zeppelin
also requires the support of Jupyter and Zeppelin community personnel.
Currently Jeff has made great efforts in Zeppelin community for PyFink. I
would greatly appreciate if anyone who active in the Jupyter community also
willing to help to integrate PyFlink.

Best,
Jincheng


Bowen Li  于2019年12月20日周五 上午12:55写道:

> - integrate PyFlink with Jupyter notebook
>- Description: users should be able to run PyFlink seamlessly in Jupyter
>- Benefits: Jupyter is the industrial standard notebook for data
> scientists. I’ve talked to a few companies in North America, they think
> Jupyter is the #1 way to empower internal DS with Flink
>
>
> On Wed, Dec 18, 2019 at 19:05 jincheng sun 
> wrote:
>
>> Also CC user-zh.
>>
>> Best,
>> Jincheng
>>
>>
>> jincheng sun  于2019年12月19日周四 上午10:20写道:
>>
>>> Hi folks,
>>>
>>> As release-1.10 is under feature-freeze(The stateless Python UDF is
>>> already supported), it is time for us to plan the features of PyFlink for
>>> the next release.
>>>
>>> To make sure the features supported in PyFlink are the mostly demanded
>>> for the community, we'd like to get more people involved, i.e., it would be
>>> better if all of the devs and users join in the discussion of which kind of
>>> features are more important and urgent.
>>>
>>> We have already listed some features from different aspects which you
>>> can find below, however it is not the ultimate plan. We appreciate any
>>> suggestions from the community, either on the functionalities or
>>> performance improvements, etc. Would be great to have the following
>>> information if you want to suggest to add some features:
>>>
>>> -
>>> - Feature description: 
>>> - Benefits of the feature: 
>>> - Use cases (optional): 
>>> --
>>>
>>> Features in my mind
>>>
>>> 1. Integration with most popular Python libraries
>>> - fromPandas/toPandas API
>>>Description:
>>>   Support to convert between Table and pandas.DataFrame.
>>>Benefits:
>>>   Users could switch between Flink and Pandas API, for example,
>>> do some analysis using Flink and then perform analysis using the Pandas API
>>> if the result data is small and could fit into the memory, and vice versa.
>>>
>>> - Support Scalar Pandas UDF
>>>Description:
>>>   Support scalar Pandas UDF in Python Table API & SQL. Both the
>>> input and output of the UDF is pandas.Series.
>>>Benefits:
>>>   1) Scalar Pandas UDF performs better than row-at-a-time UDF,
>>> ranging from 3x to over 100x (from pyspark)
>>>   2) Users could use Pandas/Numpy API in the Python UDF
>>> implementation if the input/output data type is pandas.Series
>>>
>>> - Support Pandas UDAF in batch GroupBy aggregation
>>>Description:
>>>Support Pandas UDAF in batch GroupBy aggregation of Python
>>> Table API & SQL. Both the input and output of the UDF is pandas.DataFrame.
>>>Benefits:
>>>   1) Pandas UDAF performs better than row-at-a-time UDAF more
>>> than 10x in certain scenarios
>>>   2) Users could use Pandas/Numpy API in the Python UDAF
>>> implementation if the input/output data type is pandas.DataFrame
>>>
>>> 2. Fully support  all kinds of Python UDF
>>> - Support Python UDAF(stateful) in GroupBy aggregation (NOTE: Please
>>> give us some use case if you want this feature to be contained in the next
>>> release)
>>>   Description:
>>> Support UDAF in GroupBy aggregation.
>>>   Benefits:
>>> Users could define and use Python UDAF and use it in GroupBy
>>> aggregation. Without it, users have to use Java/Scala UDAF.
>>>
>>> - Support Python UDTF
>>>   Description:
>>>Support  Python UDTF in Python Table API & SQL
>>>   Benefits:
>>> Users could define and use Python UDTF in Python Table API &
>>> SQL. Without it, users have to use Java/Scala UDTF.
>>>
>>> 3. Debugging and Monitoring of Python UDF
>>>- Support User-Defined Metrics

Re: [DISCUSS] What parts of the Python API should we focus on next ?

2019-12-18 Thread jincheng sun
Also CC user-zh.

Best,
Jincheng


jincheng sun  于2019年12月19日周四 上午10:20写道:

> Hi folks,
>
> As release-1.10 is under feature-freeze(The stateless Python UDF is
> already supported), it is time for us to plan the features of PyFlink for
> the next release.
>
> To make sure the features supported in PyFlink are the mostly demanded for
> the community, we'd like to get more people involved, i.e., it would be
> better if all of the devs and users join in the discussion of which kind of
> features are more important and urgent.
>
> We have already listed some features from different aspects which you can
> find below, however it is not the ultimate plan. We appreciate any
> suggestions from the community, either on the functionalities or
> performance improvements, etc. Would be great to have the following
> information if you want to suggest to add some features:
>
> -
> - Feature description: 
> - Benefits of the feature: 
> - Use cases (optional): 
> --
>
> Features in my mind
>
> 1. Integration with most popular Python libraries
> - fromPandas/toPandas API
>Description:
>   Support to convert between Table and pandas.DataFrame.
>Benefits:
>   Users could switch between Flink and Pandas API, for example, do
> some analysis using Flink and then perform analysis using the Pandas API if
> the result data is small and could fit into the memory, and vice versa.
>
> - Support Scalar Pandas UDF
>Description:
>   Support scalar Pandas UDF in Python Table API & SQL. Both the
> input and output of the UDF is pandas.Series.
>Benefits:
>   1) Scalar Pandas UDF performs better than row-at-a-time UDF,
> ranging from 3x to over 100x (from pyspark)
>   2) Users could use Pandas/Numpy API in the Python UDF
> implementation if the input/output data type is pandas.Series
>
> - Support Pandas UDAF in batch GroupBy aggregation
>Description:
>Support Pandas UDAF in batch GroupBy aggregation of Python
> Table API & SQL. Both the input and output of the UDF is pandas.DataFrame.
>Benefits:
>   1) Pandas UDAF performs better than row-at-a-time UDAF more than
> 10x in certain scenarios
>   2) Users could use Pandas/Numpy API in the Python UDAF
> implementation if the input/output data type is pandas.DataFrame
>
> 2. Fully support  all kinds of Python UDF
> - Support Python UDAF(stateful) in GroupBy aggregation (NOTE: Please
> give us some use case if you want this feature to be contained in the next
> release)
>   Description:
> Support UDAF in GroupBy aggregation.
>   Benefits:
> Users could define and use Python UDAF and use it in GroupBy
> aggregation. Without it, users have to use Java/Scala UDAF.
>
> - Support Python UDTF
>   Description:
>Support  Python UDTF in Python Table API & SQL
>   Benefits:
> Users could define and use Python UDTF in Python Table API & SQL.
> Without it, users have to use Java/Scala UDTF.
>
> 3. Debugging and Monitoring of Python UDF
>- Support User-Defined Metrics
>  Description:
>Allow users to define user-defined metrics and global job
> parameters with Python UDFs.
>  Benefits:
>UDF needs metrics to monitor some business or technical indicators,
> which is also a requirement for UDFs.
>
>- Make the log level configurable
>  Description:
>Allow users to config the log level of Python UDF.
>  Benefits:
>Users could configure different log levels when debugging and
> deploying.
>
> 4. Enrich the Python execution environment
>- Docker Mode Support
>  Description:
>  Support running python UDF in docker workers.
>  Benefits:
>  Support various of deployments to meet more users' requirements.
>
> 5. Expand the usage scope of Python UDF
>- Support to use Python UDF via SQL client
>  Description:
>  Support to register and use Python UDF via SQL client
>  Benefits:
>  SQL client is a very important interface for SQL users. This
> feature allows SQL users to use Python UDFs via SQL client.
>
>- Integrate Python UDF with Notebooks
>  Description:
>  Such as Zeppelin, etc (Especially Python dependencies)
>
>- Support to register Python UDF into catalog
>   Description:
>   Support to register Python UDF into catalog
>   Benefits:
>   1)Catalog is the centralized place to manage metadata such as
> tables, UDFs, etc. With it, users could register the UDFs once and use it
> anywhere.
> 

[DISCUSS] What parts of the Python API should we focus on next ?

2019-12-18 Thread jincheng sun
Hi folks,

As release-1.10 is under feature-freeze(The stateless Python UDF is already
supported), it is time for us to plan the features of PyFlink for the next
release.

To make sure the features supported in PyFlink are the mostly demanded for
the community, we'd like to get more people involved, i.e., it would be
better if all of the devs and users join in the discussion of which kind of
features are more important and urgent.

We have already listed some features from different aspects which you can
find below, however it is not the ultimate plan. We appreciate any
suggestions from the community, either on the functionalities or
performance improvements, etc. Would be great to have the following
information if you want to suggest to add some features:

-
- Feature description: 
- Benefits of the feature: 
- Use cases (optional): 
--

Features in my mind

1. Integration with most popular Python libraries
- fromPandas/toPandas API
   Description:
  Support to convert between Table and pandas.DataFrame.
   Benefits:
  Users could switch between Flink and Pandas API, for example, do
some analysis using Flink and then perform analysis using the Pandas API if
the result data is small and could fit into the memory, and vice versa.

- Support Scalar Pandas UDF
   Description:
  Support scalar Pandas UDF in Python Table API & SQL. Both the
input and output of the UDF is pandas.Series.
   Benefits:
  1) Scalar Pandas UDF performs better than row-at-a-time UDF,
ranging from 3x to over 100x (from pyspark)
  2) Users could use Pandas/Numpy API in the Python UDF
implementation if the input/output data type is pandas.Series

- Support Pandas UDAF in batch GroupBy aggregation
   Description:
   Support Pandas UDAF in batch GroupBy aggregation of Python Table
API & SQL. Both the input and output of the UDF is pandas.DataFrame.
   Benefits:
  1) Pandas UDAF performs better than row-at-a-time UDAF more than
10x in certain scenarios
  2) Users could use Pandas/Numpy API in the Python UDAF
implementation if the input/output data type is pandas.DataFrame

2. Fully support  all kinds of Python UDF
- Support Python UDAF(stateful) in GroupBy aggregation (NOTE: Please
give us some use case if you want this feature to be contained in the next
release)
  Description:
Support UDAF in GroupBy aggregation.
  Benefits:
Users could define and use Python UDAF and use it in GroupBy
aggregation. Without it, users have to use Java/Scala UDAF.

- Support Python UDTF
  Description:
   Support  Python UDTF in Python Table API & SQL
  Benefits:
Users could define and use Python UDTF in Python Table API & SQL.
Without it, users have to use Java/Scala UDTF.

3. Debugging and Monitoring of Python UDF
   - Support User-Defined Metrics
 Description:
   Allow users to define user-defined metrics and global job parameters
with Python UDFs.
 Benefits:
   UDF needs metrics to monitor some business or technical indicators,
which is also a requirement for UDFs.

   - Make the log level configurable
 Description:
   Allow users to config the log level of Python UDF.
 Benefits:
   Users could configure different log levels when debugging and
deploying.

4. Enrich the Python execution environment
   - Docker Mode Support
 Description:
 Support running python UDF in docker workers.
 Benefits:
 Support various of deployments to meet more users' requirements.

5. Expand the usage scope of Python UDF
   - Support to use Python UDF via SQL client
 Description:
 Support to register and use Python UDF via SQL client
 Benefits:
 SQL client is a very important interface for SQL users. This
feature allows SQL users to use Python UDFs via SQL client.

   - Integrate Python UDF with Notebooks
 Description:
 Such as Zeppelin, etc (Especially Python dependencies)

   - Support to register Python UDF into catalog
  Description:
  Support to register Python UDF into catalog
  Benefits:
  1)Catalog is the centralized place to manage metadata such as
tables, UDFs, etc. With it, users could register the UDFs once and use it
anywhere.
  2) It's an important part of the SQL functionality. If Python
UDFs are not supported to be registered and used in catalog, Python UDFs
could not be shared between jobs.

6. Performance Improvements of Python UDF
   - Cython improvements
  Description:
  Cython Improvements in coder & operations
  Benefits:
  Initial tests show that Cython will speed 3x+ in coder
serialization/deserialization.

7. Add Python ML API
   - Add Python ML Pipeline API
 Description:
 Align Python ML Pipeline API with Java/Scala
 Benefits:
   1) Currently, we already have the Pipeline APIs for ML. It would be
good to also

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread jincheng sun
Thanks for being the release manager and the great work Hequn :)
Also thanks to the community making this release possible!

Best,
Jincheng

Jark Wu  于2019年12月12日周四 下午3:23写道:

> Thanks Hequn for helping out this release and being the release manager.
> Great work!
>
> Best,
> Jark
>
> On Thu, 12 Dec 2019 at 15:02, Jeff Zhang  wrote:
>
> > Great work, Hequn
> >
> > Dian Fu  于2019年12月12日周四 下午2:32写道:
> >
> >> Thanks Hequn for being the release manager and everyone who contributed
> >> to this release.
> >>
> >> Regards,
> >> Dian
> >>
> >> 在 2019年12月12日,下午2:24,Hequn Cheng  写道:
> >>
> >> Hi,
> >>
> >> The Apache Flink community is very happy to announce the release of
> >> Apache Flink 1.8.3, which is the third bugfix release for the Apache
> Flink
> >> 1.8 series.
> >>
> >> Apache Flink® is an open-source stream processing framework for
> >> distributed, high-performing, always-available, and accurate data
> streaming
> >> applications.
> >>
> >> The release is available for download at:
> >> https://flink.apache.org/downloads.html
> >>
> >> Please check out the release blog post for an overview of the
> >> improvements for this bugfix release:
> >> https://flink.apache.org/news/2019/12/11/release-1.8.3.html
> >>
> >> The full release notes are available in Jira:
> >> https://issues.apache.org/jira/projects/FLINK/versions/12346112
> >>
> >> We would like to thank all contributors of the Apache Flink community
> who
> >> made this release possible!
> >> Great thanks to @Jincheng as a mentor during this release.
> >>
> >> Regards,
> >> Hequn
> >>
> >>
> >>
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-04 Thread jincheng sun
+1  for drop it, and Thanks for bring up this discussion Chesnay!

Best,
Jincheng

Jark Wu  于2019年12月5日周四 上午10:19写道:

> +1 for dropping, also cc'ed user mailing list.
>
>
> Best,
> Jark
>
> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf 
> wrote:
>
> > Hi Chesnay,
> >
> > +1 for dropping. I have not heard from any user using 0.8 or 0.9 for a
> long
> > while.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Wed, Dec 4, 2019 at 1:57 PM Chesnay Schepler 
> > wrote:
> >
> > > Hello,
> > >
> > > What's everyone's take on dropping the Kafka 0.8/0.9 connectors from
> the
> > > Flink codebase?
> > >
> > > We haven't touched either of them for the 1.10 release, and it seems
> > > quite unlikely that we will do so in the future.
> > >
> > > We could finally close a number of test stability tickets that have
> been
> > > lingering for quite a while.
> > >
> > >
> > > Regards,
> > >
> > > Chesnay
> > >
> > >
> >
> > --
> >
> > Konstantin Knauf | Solutions Architect
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica 
> >
> >
> > --
> >
> > Join Flink Forward  - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Tony) Cheng
> >
>


Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-12 Thread jincheng sun
Hi Stephan,

bit +1 for adding this great features to Apache Flink.

Regarding where we should place it, put it into Flink core repository or
create a separate repository? I prefer put it into main repository and
looking forward the more detail discussion for this decision.

Best,
Jincheng


Jingsong Li  于2019年10月12日周六 上午11:32写道:

> Hi Stephan,
>
> big +1 for this contribution. It provides another user interface that is
> easy to use and popular at this time. these functions, It's hard for users
> to write in SQL/TableApi, while using DataStream is too complex. (We've
> done some stateFun kind jobs using DataStream before). With statefun, it is
> very easy.
>
> I think it's also a good opportunity to exercise Flink's core
> capabilities. I looked at stateful-functions-flink briefly, it is very
> interesting. I think there are many other things Flink can improve. So I
> think it's a better thing to put it into Flink, and the improvement for it
> will be more natural in the future.
>
> Best,
> Jingsong Lee
>
> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Stephan,
>>
>> I think this is a nice library, but what I like more about it is that it
>> suggests exploring different use-cases. I think it definitely makes sense
>> for the Flink community to explore more lightweight applications that
>> reuses resources. Therefore I definitely think it is a good idea for Flink
>> community to accept this contribution and help maintaining it.
>>
>> Personally I'd prefer to have it in a separate repository. There were a
>> few discussions before where different people were suggesting to extract
>> connectors and other libraries to separate repositories. Moreover I think
>> it could serve as an example for the Flink ecosystem website[1]. This could
>> be the first project in there and give a good impression that the community
>> sees potential in the ecosystem website.
>>
>> Lastly, I'm wondering if this should go through PMC vote according to our
>> bylaws[2]. In the end the suggestion is to adopt an existing code base as
>> is. It also proposes a new programs concept that could result in a shift of
>> priorities for the community in a long run.
>>
>> Best,
>>
>> Dawid
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html
>>
>> [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws
>> On 11/10/2019 13:12, Till Rohrmann wrote:
>>
>> Hi Stephan,
>>
>> +1 for adding stateful functions to Flink. I believe the new set of
>> applications this feature will unlock will be super interesting for new and
>> existing Flink users alike.
>>
>> One reason for not including it in the main repository would to not being
>> bound to Flink's release cadence. This would allow to release faster and
>> more often. However, I believe that having it eventually in Flink's main
>> repository would be beneficial in the long run.
>>
>> Cheers,
>> Till
>>
>> On Fri, Oct 11, 2019 at 12:56 PM Trevor Grant 
>> wrote:
>>
>>> +1 non-binding on contribution.
>>>
>>> Separate repo, or feature branch to start maybe? I just feel like in the
>>> beginning this thing is going to have lots of breaking changes that maybe
>>> aren't going to fit well with tests / other "v1+" release code. Just my
>>> .02.
>>>
>>>
>>>
>>> On Fri, Oct 11, 2019 at 4:38 AM Stephan Ewen  wrote:
>>>
 Dear Flink Community!

 Some of you probably heard it already: On Tuesday, at Flink Forward
 Berlin, we announced **Stateful Functions**.

 Stateful Functions is a library on Flink to implement general purpose
 applications. It is built around stateful functions (who would have thunk)
 that can communicate arbitrarily through messages, have consistent
 state, and a small resource footprint. They are a bit like keyed
 ProcessFunctions
 that can send each other messages.
 As simple as this sounds, this means you can now communicate in non-DAG
 patterns, so it allows users to build programs they cannot build with 
 Flink.
 It also has other neat properties, like multiplexing of functions,
 modular composition, tooling both container-based deployments and
 as-a-Flink-job deployments.

 You can find out more about it here
   - Website: https://statefun.io/
   - Code: https://github.com/ververica/stateful-functions
   - Talk with motivation:
 https://speakerdeck.com/stephanewen/stateful-functions-building-general-purpose-applications-and-services-on-apache-flink?slide=12


 Now for the main issue: **We would like to contribute this project to
 Apache Flink**

 I believe that this is a great fit for both sides.
 For the Flink community, it would be a way to extend the capabilities
 and use cases of Flink into a completely different type of applications and
 thus grow the community into this new field.
 Many discussions recently about evolving the Flink runtime (

Re: [ANNOUNCE] Apache Flink 1.8.2 released

2019-09-13 Thread jincheng sun
Thanks for being the release manager and the great work Jark :)
Also thanks to the community making this release possible!

Best,
Jincheng

Jark Wu  于2019年9月13日周五 下午10:07写道:

> Hi,
>
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.8.2, which is the second bugfix release for the Apache Flink 1.8
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2019/09/11/release-1.8.2.html
>
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12345670
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
> Great thanks to @Jincheng for the kindly help during this release.
>
> Regards,
> Jark
>


[ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread jincheng sun
Hi everyone,

I'm very happy to announce that Hequn accepted the offer of the Flink PMC
to become a committer of the Flink project.

Hequn has been contributing to Flink for many years, mainly working on
SQL/Table API features. He's also frequently helping out on the user
mailing lists and helping check/vote the release.

Congratulations Hequn!

Best, Jincheng
(on behalf of the Flink PMC)


Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread jincheng sun
Congratulations Rong, Well deserved!

Cheers,
Jincheng

Dian Fu  于2019年7月12日周五 上午9:06写道:

>
> Congrats Rong!
>
>
> 在 2019年7月12日,上午8:47,Chen YuZhao  写道:
>
> congratulations!
>
> 获取 Outlook for iOS 
>
> --
> *发件人:* Rong Rong 
> *发送时间:* 星期五, 七月 12, 2019 8:09 上午
> *收件人:* Hao Sun
> *抄送:* Xuefu Z; dev; Flink ML
> *主题:* Re: [ANNOUNCE] Rong Rong becomes a Flink committer
>
> Thank you all for the warm welcome!
>
> It's my honor to become an Apache Flink committer.
> I will continue to work on this great project and contribute more to the
> community.
>
> Cheers,
> Rong
>
> On Thu, Jul 11, 2019 at 1:05 PM Hao Sun  wrote:
>
>> Congratulations Rong.
>>
>> On Thu, Jul 11, 2019, 11:39 Xuefu Z  wrote:
>>
>>> Congratulations, Rong!
>>>
>>> On Thu, Jul 11, 2019 at 10:59 AM Bowen Li  wrote:
>>>
 Congrats, Rong!


 On Thu, Jul 11, 2019 at 10:48 AM Oytun Tez  wrote:

 > Congratulations Rong!
 >
 > ---
 > Oytun Tez
 >
 > *M O T A W O R D*
 > The World's Fastest Human Translation Platform.
 > oy...@motaword.com — www.motaword.com
 >
 >
 > On Thu, Jul 11, 2019 at 1:44 PM Peter Huang <
 huangzhenqiu0...@gmail.com>
 > wrote:
 >
 >> Congrats Rong!
 >>
 >> On Thu, Jul 11, 2019 at 10:40 AM Becket Qin 
 wrote:
 >>
 >>> Congrats, Rong!
 >>>
 >>> On Fri, Jul 12, 2019 at 1:13 AM Xingcan Cui 
 wrote:
 >>>
  Congrats Rong!
 
  Best,
  Xingcan
 
  On Jul 11, 2019, at 1:08 PM, Shuyi Chen 
 wrote:
 
  Congratulations, Rong!
 
  On Thu, Jul 11, 2019 at 8:26 AM Yu Li  wrote:
 
 > Congratulations Rong!
 >
 > Best Regards,
 > Yu
 >
 >
 > On Thu, 11 Jul 2019 at 22:54, zhijiang <
 wangzhijiang...@aliyun.com>
 > wrote:
 >
 >> Congratulations Rong!
 >>
 >> Best,
 >> Zhijiang
 >>
 >>
 --
 >> From:Kurt Young 
 >> Send Time:2019年7月11日(星期四) 22:54
 >> To:Kostas Kloudas 
 >> Cc:Jark Wu ; Fabian Hueske >>> >;
 >> dev ; user 
 >> Subject:Re: [ANNOUNCE] Rong Rong becomes a Flink committer
 >>
 >> Congratulations Rong!
 >>
 >> Best,
 >> Kurt
 >>
 >>
 >> On Thu, Jul 11, 2019 at 10:53 PM Kostas Kloudas <
 kklou...@gmail.com>
 >> wrote:
 >> Congratulations Rong!
 >>
 >> On Thu, Jul 11, 2019 at 4:40 PM Jark Wu 
 wrote:
 >> Congratulations Rong Rong!
 >> Welcome on board!
 >>
 >> On Thu, 11 Jul 2019 at 22:25, Fabian Hueske 
 >> wrote:
 >> Hi everyone,
 >>
 >> I'm very happy to announce that Rong Rong accepted the offer of
 the
 >> Flink PMC to become a committer of the Flink project.
 >>
 >> Rong has been contributing to Flink for many years, mainly
 working on
 >> SQL and Yarn security features. He's also frequently helping out
 on the
 >> user@f.a.o mailing lists.
 >>
 >> Congratulations Rong!
 >>
 >> Best, Fabian
 >> (on behalf of the Flink PMC)
 >>
 >>
 >>
 

>>>
>>>
>>> --
>>> Xuefu Zhang
>>>
>>> "In Honey We Trust!"
>>>
>>
>


Re: Flink 1.8.1 release tag missing?

2019-07-09 Thread jincheng sun
Thanks Bekir Oguz and Chesnay!

Sorry for that, I forgot push the tag, I've pushed the tag to the repo
now.  https://github.com/apache/flink/tree/release-1.8.1
Thanks again, and I'm very sorry for my negligence has caused confusion in
your use.

Thanks,
Jincheng

Bekir Oguz  于2019年7月10日周三 上午12:50写道:

> Hi,
> I would like to build the 1.8.1 version of the flink-connector-kinesis
> module but cannot find the release tag in GitHub repo.
> I see release candidate 1 (release-1.8.1-rc1) tag, but not sure whether
> this consists of all the 40 bug fixes in 1.8.1 or not.
>
> Which hash or tag should I use to release the flink-connector-kinesis
> module?
>
> Regards,
> Bekir Oguz
>
>


Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-03 Thread jincheng sun
I've also tweeted about it from @ApacheFlink
https://twitter.com/ApacheFlink/status/1146407762106040321

Dawid Wysakowicz  于2019年7月3日周三 下午4:24写道:

> Congrats to everyone involved and thank you Jincheng for being the release
> manager!
> On 03/07/2019 08:38, JingsongLee wrote:
>
> Thanks jincheng for your great job.
>
> Best, JingsongLee
>
> --
> From:Congxian Qiu  
> Send Time:2019年7月3日(星期三) 14:35
> To:d...@flink.apache.org  
> Cc:Dian Fu  ; jincheng sun
>  ; Hequn Cheng
>  ; user
>  ; announce
>  
> Subject:Re: [ANNOUNCE] Apache Flink 1.8.1 released
>
> Thanks for being the release manager and the great job
>
> Best,
> Congxian
>
>
> Jark Wu  于2019年7月3日周三 上午10:23写道:
> Thanks for being the release manager and the great job!
>
> Cheers,
> Jark
>
> On Wed, 3 Jul 2019 at 10:16, Dian Fu  wrote:
>
> > Awesome! Thanks a lot for being the release manager. Great job! @Jincheng
> >
> > Regards,
> > Dian
> >
> > 在 2019年7月3日,上午10:08,jincheng sun  写道:
> >
> > I've also tweeted about it from my twitter:
> > https://twitter.com/sunjincheng121/status/1146236834344648704
> > later would be tweeted it from @ApacheFlink!
> >
> > Best, Jincheng
> >
> > Hequn Cheng  于2019年7月3日周三 上午9:48写道:
> >
> >> Thanks for being the release manager and the great work Jincheng!
> >> Also thanks to Gorden and the community making this release possible!
> >>
> >> Best, Hequn
> >>
> >> On Wed, Jul 3, 2019 at 9:40 AM jincheng sun 
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> The Apache Flink community is very happy to announce the release of
> >>> Apache Flink 1.8.1, which is the first bugfix release for the Apache
> Flink
> >>> 1.8 series.
> >>>
> >>> Apache Flink® is an open-source stream processing framework for
> >>> distributed, high-performing, always-available, and accurate data
> streaming
> >>> applications.
> >>>
> >>> The release is available for download at:
> >>> https://flink.apache.org/downloads.html
> >>>
> >>> Please check out the release blog post for an overview of the
> >>> improvements for this bugfix release:
> >>> https://flink.apache.org/news/2019/07/02/release-1.8.1.html
> >>>
> >>> The full release notes are available in Jira:
> >>>
> >>>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164
> >>>
> >>> We would like to thank all contributors of the Apache Flink community
> >>> who made this release possible!
> >>>
> >>> Great thanks to @Tzu-Li (Gordon) Tai  's offline
> >>> kind help!
> >>>
> >>> Regards,
> >>> Jincheng
> >>>
> >>
> >
>
>


Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread jincheng sun
I've also tweeted about it from my twitter:
https://twitter.com/sunjincheng121/status/1146236834344648704
later would be tweeted it from @ApacheFlink!

Best, Jincheng

Hequn Cheng  于2019年7月3日周三 上午9:48写道:

> Thanks for being the release manager and the great work Jincheng!
> Also thanks to Gorden and the community making this release possible!
>
> Best, Hequn
>
> On Wed, Jul 3, 2019 at 9:40 AM jincheng sun 
> wrote:
>
>> Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.8.1, which is the first bugfix release for the Apache Flink
>> 1.8 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/07/02/release-1.8.1.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Great thanks to @Tzu-Li (Gordon) Tai  's offline
>> kind help!
>>
>> Regards,
>> Jincheng
>>
>


[ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread jincheng sun
Hi,

The Apache Flink community is very happy to announce the release of Apache
Flink 1.8.1, which is the first bugfix release for the Apache Flink 1.8
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the
improvements for this bugfix release:
https://flink.apache.org/news/2019/07/02/release-1.8.1.html

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164

We would like to thank all contributors of the Apache Flink community who
made this release possible!

Great thanks to @Tzu-Li (Gordon) Tai  's offline kind
help!

Regards,
Jincheng


Re: [DISCUSS] Deprecate previous Python APIs

2019-06-14 Thread jincheng sun
+1 for removing and we can try our best to enrich the new Python API.

Cheers,
Jincheng

Yu Li  于2019年6月14日周五 下午6:42写道:

> +1 on removing plus an explicit NOTE thread, to prevent any neglection due
> to the current title (deprecation).
>
> Best Regards,
> Yu
>
>
> On Fri, 14 Jun 2019 at 18:09, Stephan Ewen  wrote:
>
>> Okay, so we seem to have consensus for at least deprecating them, with a
>> suggestion to even directly remove them.
>>
>> A previous survey also brought no users of that python API to light [1]
>> I am inclined to go with removing.
>> Typically, deprecation is the way to go, but we could make an exception
>> and expedite things here.
>>
>> [1]
>> https://lists.apache.org/thread.html/348366080d6b87bf390efb98e5bf268620ab04a0451f8459e2f466cd@%3Cdev.flink.apache.org%3E
>>
>>
>> On Wed, Jun 12, 2019 at 2:37 PM Chesnay Schepler 
>> wrote:
>>
>>> I would just remove them. As you said, there are very limited as to what
>>> features they support, and haven't been under active development for
>>> several releases.
>>>
>>> Existing users (if there even are any) could continue to use older
>>> version against newer releases. It's is slightly more involved than for
>>> say, flink-ml, as you also have to copy the start-scripts (or figure out
>>> how to use the jars yourself), but it is still feasible and can be
>>> documented in the release notes.
>>>
>>> On 11/06/2019 15:30, Stephan Ewen wrote:
>>> > Hi all!
>>> >
>>> > I would suggest to deprecating the existing python APIs for DataSet and
>>> > DataStream API with the 1.9 release.
>>> >
>>> > Background is that there is a new Python API under development.
>>> > The new Python API is initially against the Table API. Flink 1.9 will
>>> > support Table API programs without UDFs, 1.10 is planned to support
>>> UDFs.
>>> > Future versions would support also the DataStream API.
>>> >
>>> > In the long term, Flink should have one Python API for DataStream and
>>> Table
>>> > APIs. We should not maintain multiple different implementations and
>>> confuse
>>> > users that way.
>>> > Given that the existing Python APIs are a bit limited and not under
>>> active
>>> > development, I would suggest to deprecate them in favor of the new API.
>>> >
>>> > Best,
>>> > Stephan
>>> >
>>>
>>>


Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread jincheng sun
big +1 for the proposal.

We will soon complete all the Python API functional development of the 1.9
release, the development of UDFs will be carried out. After the support of
UDFs is completed, it will be very natural to support Datastream API.

If all of us agree with this proposal, I believe that for the 1.10 release,
it is possible to complete support both UDFs and DataStream API. And we
will do our best to make the 1.10 release that contains the Python
DataStream API support.

So, great thanks to @Stephan for this proposal!

Best,
Jincheng

Zili Chen  于2019年6月11日周二 下午10:56写道:

> +1
>
> Best,
> tison.
>
>
> zhijiang  于2019年6月11日周二 下午10:52写道:
>
>> It is reasonable as stephan explained. +1 from my side!
>>
>> --
>> From:Jeff Zhang 
>> Send Time:2019年6月11日(星期二) 22:11
>> To:Stephan Ewen 
>> Cc:user ; dev 
>> Subject:Re: [DISCUSS] Deprecate previous Python APIs
>>
>> +1
>>
>> Stephan Ewen  于2019年6月11日周二 下午9:30写道:
>>
>> > Hi all!
>> >
>> > I would suggest to deprecating the existing python APIs for DataSet and
>> > DataStream API with the 1.9 release.
>> >
>> > Background is that there is a new Python API under development.
>> > The new Python API is initially against the Table API. Flink 1.9 will
>>
>> > support Table API programs without UDFs, 1.10 is planned to support UDFs.
>> > Future versions would support also the DataStream API.
>> >
>> > In the long term, Flink should have one Python API for DataStream and
>>
>> > Table APIs. We should not maintain multiple different implementations and
>> > confuse users that way.
>>
>> > Given that the existing Python APIs are a bit limited and not under active
>> > development, I would suggest to deprecate them in favor of the new API.
>> >
>> > Best,
>> > Stephan
>> >
>> >
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>


Re: [SURVEY] Usage of flink-ml and [DISCUSS] Delete flink-ml

2019-05-27 Thread jincheng sun
+1 for remove it!

And we also plan to delete the `flink-libraries/flink-ml-uber`, right?

Best,
Jincheng

Rong Rong  于2019年5月24日周五 上午1:18写道:

> +1 for the deletion.
>
> Also I think it also might be a good idea to update the roadmap for the
> plan of removal/development since we've reached the consensus on FLIP-39.
>
> Thanks,
> Rong
>
>
> On Wed, May 22, 2019 at 8:26 AM Shaoxuan Wang  wrote:
>
>> Hi Chesnay,
>> Yes, you are right. There is not any active commit planned for the legacy
>> Flink-ml package. It does not matter delete it now or later. I will open a
>> PR and remove it.
>>
>> Shaoxuan
>>
>> On Wed, May 22, 2019 at 7:05 PM Chesnay Schepler 
>> wrote:
>>
>>> I believe we can remove it regardless since users could just use the 1.8
>>> version against future releases.
>>>
>>> Generally speaking, any library/connector that is no longer actively
>>> developed can be removed from the project as existing users can always
>>> rely on previous versions, which should continue to work by virtue of
>>> working against @Stable APIs.
>>>
>>> On 22/05/2019 12:08, Shaoxuan Wang wrote:
>>> > Hi Flink community,
>>> >
>>> > We plan to delete/deprecate the legacy flink-libraries/flink-ml
>>> package in
>>> > Flink1.9, and replace it with the new flink-ml interface proposed in
>>> FLIP39
>>> > (FLINK-12470).
>>> > Before we remove this package, I want to reach out to you and ask if
>>> there
>>> > is any active project still uses this package. Please respond to this
>>> > thread and outline how you use flink-libraries/flink-ml.
>>> > Depending on the replies of activity and adoption
>>> > of flink-libraries/flink-ml, we will decide to either delete this
>>> package
>>> > in Flink1.9 or deprecate it for now & remove it in the next release
>>> after
>>> > 1.9.
>>> >
>>> > Thanks for your attention and help!
>>> >
>>> > Regards,
>>> > Shaoxuan
>>> >
>>>
>>>


Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread jincheng sun
Thanks a lot for being our release manager @Aljoscha Krettek
 Great job!
And also a big thanks to the community for making this release possible.

Cheers,
Jincheng

Aljoscha Krettek  于2019年4月10日周三 下午4:31写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.8.0, which is the next major release.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2019/04/09/release-1.8.0.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Aljoscha


Re: FlinkCEP and SQL?

2019-04-04 Thread jincheng sun
Hi BR Esa,
CEP is available in Flink SQL, Please the detail here:
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/sql.html#pattern-recognition
Best,
Jincheng

Esa Heikkinen (TAU)  于2019年4月4日周四 下午4:44写道:

> Hi
>
>
>
> What is the situation of FlinkCEP and SQL?
>
>
>
> Is it already possible to use SQL in CEP?
>
>
>
> Is there any example cases where SQL is used in CEP?
>
>
>
> BR Esa
>
>
>


Re: [DISCUSS] Introduction of a Table API Java Expression DSL

2019-03-26 Thread jincheng sun
Thanks for bringing up this DISCUSS Timo!

Java Expression DSL is pretty useful for java user. When we have the Java
Expression DSL, Java API will become very rich and easy to use!

+1 from my side.

Best,
Jincheng


Dawid Wysakowicz  于2019年3月26日周二 下午5:08写道:

> Hi,
>
> I really like the idea of introducing Java Expression DSL. I think this
> will solve many problems e.g. right now it's quite tricky how string
> literals work in scala (sometimes it might go through the ExpressionParser
> and it will end up as an UnresolvedFieldReference), another important
> problem we could solve with this is the need for unique column names in
> tables right now. We could at some point introduce sth like:
>
> Table table = ...
>
> table.field("fieldName")
>
> and etc. A common "entry point" to expressions should simplify a lot.
>
> Therefore I am strongly +1 for introducing this feature.
>
> @Jark I think we could aim to introduce the new Java DSL API in 1.9 and
> once we do that we could deprecate the string approach.
>
> Best,
>
> Dawid
> On 22/03/2019 03:36, Jark Wu wrote:
>
> Hi Timo,
>
> Sounds good to me.
>
> Do you want to deprecate the string-based API in 1.9 or make the decision
> in 1.10 after some feedbacks ?
>
>
> On Thu, 21 Mar 2019 at 21:32, Timo Walther  wrote:
>
>> Thanks for your feedback Rong and Jark.
>>
>> @Jark: Yes, you are right that the string-based API is used quite a lot.
>> On the other side, the potential user base in the future is still bigger
>> than our current user base. Because the Table API will become equally
>> important as the DataStream API, we really need to fix some crucial design
>> decisions before it is too late. I would suggest to introduce the new DSL
>> in 1.9 and remove the Expression parser either in 1.10 or 1.11. From a
>> developement point of view, I think we can handle the overhead to maintain
>> 3 APIs until then because 2 APIs will share the same code base + expression
>> parser.
>>
>> Regards,
>> Timo
>>
>> Am 21.03.19 um 05:21 schrieb Jark Wu:
>>
>> Hi Timo,
>>
>> I'm +1 on the proposal. I like the idea to provide a Java DSL which is
>> more friendly than string-based approach in programming.
>>
>> My concern is if/when we can drop the string-based expression parser. If
>> it takes a very long time, we have to paid more development
>> cost on the three Table APIs. As far as I know, the string-based API is
>> used in many companies.
>> We should also get some feedbacks from users. So I'm CCing this email to
>> user mailing list.
>>
>> Best,
>> Jark
>>
>>
>>
>> On Wed, 20 Mar 2019 at 08:51, Rong Rong  wrote:
>>
>>> Thanks for sharing the initiative of improving Java side Table expression
>>> DSL.
>>>
>>> I agree as in the doc stated that Java DSL was always a "3rd class
>>> citizen"
>>> and we've run into many hand holding scenarios with our Flink developers
>>> trying to get the Stringify syntax working.
>>> Overall I am a +1 on this, it also help reduce the development cost of
>>> the
>>> Table API so that we no longer need to maintain different DSL and
>>> documentations.
>>>
>>> I left a few comments in the doc. and also some features that I think
>>> will
>>> be beneficial to the final outcome. Please kindly take a look @Timo.
>>>
>>> Many thanks,
>>> Rong
>>>
>>> On Mon, Mar 18, 2019 at 7:15 AM Timo Walther  wrote:
>>>
>>> > Hi everyone,
>>> >
>>> > some of you might have already noticed the JIRA issue that I opened
>>> > recently [1] about introducing a proper Java expression DSL for the
>>> > Table API. Instead of using string-based expressions, we should aim for
>>> > a unified, maintainable, programmatic Java DSL.
>>> >
>>> > Some background: The Blink merging efforts and the big refactorings as
>>> > part of FLIP-32 have revealed many shortcomings in the current Table &
>>> > SQL API design. Most of these legacy issues cause problems nowadays in
>>> > making the Table API a first-class API next to the DataStream API. An
>>> > example is the ExpressionParser class[2]. It was implemented in the
>>> > early days of the Table API using Scala parser combinators. During the
>>> > last years, this parser caused many JIRA issues and user confusion on
>>> > the mailing list. Because the exceptions and syntax might not be
>>> > straight forward.
>>> >
>>> > For FLINK-11908, we added a temporary bridge instead of reimplementing
>>> > the parser in Java for FLIP-32. However, this is only a intermediate
>>> > solution until we made a final decision.
>>> >
>>> > I would like to propose a new, parser-free version of the Java Table
>>> API:
>>> >
>>> >
>>> >
>>> https://docs.google.com/document/d/1r3bfR9R6q5Km0wXKcnhfig2XQ4aMiLG5h2MTx960Fg8/edit?usp=sharing
>>> >
>>> > I already implemented an early protoype that shows that such a DSL is
>>> > not much implementation effort and integrates nicely with all existing
>>> > API methods.
>>> >
>>> > What do you think?
>>> >
>>> > Thanks for your feedback,
>>> >
>>> > Timo
>>> >
>>> > [1] https://issues.apache.org/jira/browse/FLIN

Re: Is there window trigger in Table API ?

2019-03-26 Thread jincheng sun
Hi luyj,

Currently, TableAPI does not have the trigger, due to the behavior of the
windows(unbounded, tumble, slide, session) is very clear.The behavior of
each window is as follows:

   - Unbounded Window - Each set of keys is a grouping, and each event
triggers a calculation.

   - Tumble Window - A tumbling window assigns rows to non-overlapping,
continuous windows of fixed length. Each window outputs one calculation
result.

  - Slide Window - A sliding window has a fixed size and slides by a
specified slide interval. If the slide interval is smaller than the window
size, sliding windows are overlapping. Each window outputs one calculation
result.

  - Session Window - Session windows do not have a fixed size but their
bounds are defined by an interval of inactivity, i.e., a session window is
closes if no event appears for a defined gap period. Each window outputs
one calculation result.

All of those windows are not hold all the input data, the calculations are
incremental.

About your case, I think you can use `Unbounded Window` and group by a
UDF(time) which return the day unit. e.g.:

> table.groupBy(dateFormat('time, "%Y%d")).select('a.sum)

or

table
>   .select('a, dateFormat('time, "%Y%d").cast(Types.STRING) as 'ts)
>   .groupBy('ts)
>   .select('ts, 'a.sum)


Hope to help you!

Best,
Jincheng


lu yj  于2019年3月26日周二 下午4:17写道:

> Hello,
>
> I am using Table API to do some aggregation based on time window.
>
> In DataStream API, there is trigger to control when the aggregation
> function should be invoked. Is there similar thing in Table API?
>
> Because I am using large time window, like a day. I want the intermediate
> result every time a new event is aggregated. Is that possible?  And also,
> does it hold all the input data until the window ends?
>
> Thanks!
>


Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-21 Thread jincheng sun
Thanks for the quick fix, Yu. the PR of FLINK-11972
<https://issues.apache.org/jira/browse/FLINK-11972> has been merged.

Cheers,
Jincheng

Yu Li  于2019年3月21日周四 上午7:23写道:

> -1, observed stably failure on streaming bucketing end-to-end test case in
> two different environments (Linux/MacOS) when running with both shaded
> hadoop-2.8.3 jar file
> <https://repository.apache.org/content/repositories/orgapacheflink-1213/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.0/flink-shaded-hadoop2-uber-2.8.3-1.8.0.jar>
> and hadoop-2.8.5 dist
> <http://archive.apache.org/dist/hadoop/core/hadoop-2.8.5/>, while both
> env could pass with hadoop 2.6.5. More details please refer to this
> comment
> <https://issues.apache.org/jira/browse/FLINK-11972?focusedCommentId=16797614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16797614>
> in FLINK-11972.
>
> Best Regards,
> Yu
>
>
> On Thu, 21 Mar 2019 at 04:25, jincheng sun 
> wrote:
>
>> Thanks for the quick fix Aljoscha! The FLINK-11971
>> <https://issues.apache.org/jira/browse/FLINK-11971> has been merged.
>>
>> Cheers,
>> Jincheng
>>
>> Piotr Nowojski  于2019年3月21日周四 上午12:29写道:
>>
>>> -1 from my side due to performance regression found in the master branch
>>> since Jan 29th.
>>>
>>> In 10% JVM forks it was causing huge performance drop in some of the
>>> benchmarks (up to 30-50% reduced throughput), which could mean that one out
>>> of 10 task managers could be affected by it. Today we have merged a fix for
>>> it [1]. First benchmark run was promising [2], but we have to wait until
>>> tomorrow to make sure that the problem was definitely resolved. If that’s
>>> the case, I would recommend including it in 1.8.0, because we really do not
>>> know how big of performance regression this issue can be in the real world
>>> scenarios.
>>>
>>> Regarding the second regression from mid February. We have found the
>>> responsible commit and this one is probably just a false positive. Because
>>> of the nature some of the benchmarks, they are running with low number of
>>> records (300k). The apparent performance regression was caused by higher
>>> initialisation time. When I temporarily increased the number of records to
>>> 2M, the regression was gone. Together with Till and Stefan Richter we
>>> discussed the potential impact of this longer initialisation time (in the
>>> case of said benchmarks initialisation time increased from 70ms to 120ms)
>>> and we think that it’s not a critical issue, that doesn’t have to block the
>>> release. Nevertheless there might some follow up work for this.
>>>
>>> [1] https://github.com/apache/flink/pull/8020
>>> [2] http://codespeed.dak8s.net:8000/timeline/?ben=tumblingWindow&env=2
>>>
>>> Piotr Nowojski
>>>
>>> On 20 Mar 2019, at 10:09, Aljoscha Krettek  wrote:
>>>
>>> Thanks Jincheng! It would be very good to fix those but as you said, I
>>> would say they are not blockers.
>>>
>>> On 20. Mar 2019, at 09:47, Kurt Young  wrote:
>>>
>>> +1 (non-binding)
>>>
>>> Checked items:
>>> - checked checksums and GPG files
>>> - verified that the source archives do not contains any binaries
>>> - checked that all POM files point to the same version
>>> - build from source successfully
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Wed, Mar 20, 2019 at 2:12 PM jincheng sun 
>>> wrote:
>>>
>>>> Hi Aljoscha&All,
>>>>
>>>> When I did the `end-to-end` test for RC3 under Mac OS, I found the
>>>> following two problems:
>>>>
>>>> 1. The verification returned for different `minikube status` is is not
>>>> enough for the robustness. The strings returned by different versions of
>>>> different platforms are different. the following misjudgment is caused:
>>>> When the `Command: start_kubernetes_if_not_ruunning failed` error
>>>> occurs, the minikube has actually started successfully. The core reason is
>>>> that there is a bug in the `test_kubernetes_embedded_job.sh` script. See
>>>> FLINK-11971 <https://issues.apache.org/jira/browse/FLINK-11971> for
>>>> details.
>>>>
>>>> 2. Since the difference between 1.8.0 and 1.7.x is that 1.8.x does not
>>>> put the `hadoop-shaded` JAR integrated into the dist.  It will cause an
>>>> error when the end-to

Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-20 Thread jincheng sun
Thanks for the quick fix Aljoscha! The FLINK-11971
<https://issues.apache.org/jira/browse/FLINK-11971> has been merged.

Cheers,
Jincheng

Piotr Nowojski  于2019年3月21日周四 上午12:29写道:

> -1 from my side due to performance regression found in the master branch
> since Jan 29th.
>
> In 10% JVM forks it was causing huge performance drop in some of the
> benchmarks (up to 30-50% reduced throughput), which could mean that one out
> of 10 task managers could be affected by it. Today we have merged a fix for
> it [1]. First benchmark run was promising [2], but we have to wait until
> tomorrow to make sure that the problem was definitely resolved. If that’s
> the case, I would recommend including it in 1.8.0, because we really do not
> know how big of performance regression this issue can be in the real world
> scenarios.
>
> Regarding the second regression from mid February. We have found the
> responsible commit and this one is probably just a false positive. Because
> of the nature some of the benchmarks, they are running with low number of
> records (300k). The apparent performance regression was caused by higher
> initialisation time. When I temporarily increased the number of records to
> 2M, the regression was gone. Together with Till and Stefan Richter we
> discussed the potential impact of this longer initialisation time (in the
> case of said benchmarks initialisation time increased from 70ms to 120ms)
> and we think that it’s not a critical issue, that doesn’t have to block the
> release. Nevertheless there might some follow up work for this.
>
> [1] https://github.com/apache/flink/pull/8020
> [2] http://codespeed.dak8s.net:8000/timeline/?ben=tumblingWindow&env=2
>
> Piotr Nowojski
>
> On 20 Mar 2019, at 10:09, Aljoscha Krettek  wrote:
>
> Thanks Jincheng! It would be very good to fix those but as you said, I
> would say they are not blockers.
>
> On 20. Mar 2019, at 09:47, Kurt Young  wrote:
>
> +1 (non-binding)
>
> Checked items:
> - checked checksums and GPG files
> - verified that the source archives do not contains any binaries
> - checked that all POM files point to the same version
> - build from source successfully
>
> Best,
> Kurt
>
>
> On Wed, Mar 20, 2019 at 2:12 PM jincheng sun 
> wrote:
>
>> Hi Aljoscha&All,
>>
>> When I did the `end-to-end` test for RC3 under Mac OS, I found the
>> following two problems:
>>
>> 1. The verification returned for different `minikube status` is is not
>> enough for the robustness. The strings returned by different versions of
>> different platforms are different. the following misjudgment is caused:
>> When the `Command: start_kubernetes_if_not_ruunning failed` error occurs,
>> the minikube has actually started successfully. The core reason is that
>> there is a bug in the `test_kubernetes_embedded_job.sh` script. See
>> FLINK-11971 <https://issues.apache.org/jira/browse/FLINK-11971> for
>> details.
>>
>> 2. Since the difference between 1.8.0 and 1.7.x is that 1.8.x does not
>> put the `hadoop-shaded` JAR integrated into the dist.  It will cause an
>> error when the end-to-end test cannot be found with `Hadoop` Related
>> classes,  such as: `java.lang.NoClassDefFoundError:
>> Lorg/apache/hadoop/fs/FileSystem`. So we need to improve the end-to-end
>> test script, or explicitly stated in the README, i.e. end-to-end test need
>> to add `flink-shaded-hadoop2-uber-.jar` to the classpath. See
>> FLINK-11972 <https://issues.apache.org/jira/browse/FLINK-11972> for
>> details.
>>
>> I think this is not a blocker for release-1.8.0, but I think it would be
>> better to include those commits in release-1.8 If we still have performance
>> related bugs should be fixed.
>>
>> What do you think?
>>
>> Best,
>> Jincheng
>>
>>
>> Aljoscha Krettek  于2019年3月19日周二 下午7:58写道:
>>
>>> Hi All,
>>>
>>> The release process for Flink 1.8.0 is currently ongoing. Please have a
>>> look at the thread, in case you’re interested in checking your applications
>>> against this next release of Apache Flink and participate in the process.
>>>
>>> Best,
>>> Aljoscha
>>>
>>> Begin forwarded message:
>>>
>>> *From: *Aljoscha Krettek 
>>> *Subject: **[VOTE] Release 1.8.0, release candidate #3*
>>> *Date: *19. March 2019 at 12:52:50 CET
>>> *To: *d...@flink.apache.org
>>> *Reply-To: *d...@flink.apache.org
>>>
>>> Hi everyone,
>>> Please review and vote on the release candidate 3 for Flink 1.8.0, as
>>> follows:
>>> [ ] +1, Approve

Re: [VOTE] Release 1.8.0, release candidate #3

2019-03-19 Thread jincheng sun
Hi Aljoscha&All,

When I did the `end-to-end` test for RC3 under Mac OS, I found the
following two problems:

1. The verification returned for different `minikube status` is is not
enough for the robustness. The strings returned by different versions of
different platforms are different. the following misjudgment is caused:
When the `Command: start_kubernetes_if_not_ruunning failed` error occurs,
the minikube has actually started successfully. The core reason is that
there is a bug in the `test_kubernetes_embedded_job.sh` script. See
FLINK-11971  for details.

2. Since the difference between 1.8.0 and 1.7.x is that 1.8.x does not put
the `hadoop-shaded` JAR integrated into the dist.  It will cause an error
when the end-to-end test cannot be found with `Hadoop` Related classes,
such as: `java.lang.NoClassDefFoundError:
Lorg/apache/hadoop/fs/FileSystem`. So we need to improve the end-to-end
test script, or explicitly stated in the README, i.e. end-to-end test need
to add `flink-shaded-hadoop2-uber-.jar` to the classpath. See
FLINK-11972  for details.

I think this is not a blocker for release-1.8.0, but I think it would be
better to include those commits in release-1.8 If we still have performance
related bugs should be fixed.

What do you think?

Best,
Jincheng


Aljoscha Krettek  于2019年3月19日周二 下午7:58写道:

> Hi All,
>
> The release process for Flink 1.8.0 is currently ongoing. Please have a
> look at the thread, in case you’re interested in checking your applications
> against this next release of Apache Flink and participate in the process.
>
> Best,
> Aljoscha
>
> Begin forwarded message:
>
> *From: *Aljoscha Krettek 
> *Subject: **[VOTE] Release 1.8.0, release candidate #3*
> *Date: *19. March 2019 at 12:52:50 CET
> *To: *d...@flink.apache.org
> *Reply-To: *d...@flink.apache.org
>
> Hi everyone,
> Please review and vote on the release candidate 3 for Flink 1.8.0, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org  [2], which are
> signed with the key with fingerprint
> F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.8.0-rc3" [5],
> * website pull request listing the new release [6]
> * website pull request adding announcement blog post [7].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Aljoscha
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
> <
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344274
> >
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc3/ <
> https://dist.apache.org/repos/dist/dev/flink/flink-1.8.0-rc3/>
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS <
> https://dist.apache.org/repos/dist/release/flink/KEYS>
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1214
> 
> [5]
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=b505c0822edd2aed7fa22ed75eca40dca1a9de42
> <
> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=b505c0822edd2aed7fa22ed75eca40dca1a9de42>
>
> [6] https://github.com/apache/flink-web/pull/180 <
> https://github.com/apache/flink-web/pull/180>
> [7] https://github.com/apache/flink-web/pull/179 <
> https://github.com/apache/flink-web/pull/179>
>
> P.S. The difference to the previous RCs 1 and 2 is very small, you can
> fetch the tags and do a "git log release-1.8.0-rc1..release-1.8.0-rc3” to
> see the difference in commits. Its fixes for the issues that led to the
> cancellation of the previous RCs plus smaller fixes. Most
> verification/testing that was carried out should apply as is to this RC.
> Any functional verification that you did on previous RCs should therefore
> easily carry over to this one.
>
>
>


Re: [ANNOUNCE] Apache Flink 1.7.2 released

2019-02-17 Thread jincheng sun
Thanks a lot for being our release manager Gordon ,
Great job!
 And also a big thanks to the community for making this release possible.

Cheers,
Jincheng


Tzu-Li (Gordon) Tai  于2019年2月18日周一 上午10:29写道:

> Hi,
>
> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.7.2, which is the second bugfix release for the Apache
> Flink 1.7 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data
> streaming applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
>
> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> https://flink.apache.org/news/2019/02/15/release-1.7.2.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344632
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Gordon
>


Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread jincheng sun
Hi Stephan,

Thanks for the clarification! You are right, we have never initiated a
discussion about supporting OVER Window on DataStream, we can discuss it in
a separate thread. I agree with you add the item after move the discussion
forward.

+1 for putting the roadmap on the website.
+1 for periodically update the roadmap, as mentioned by Fabian, we can
update it at every feature version release.

Thanks,
Jincheng

Stephan Ewen  于2019年2月14日周四 下午5:44写道:

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>>>
>>>> Hi all!
>>>>
>>>> Recently several contributors, committers, and users asked about making
>>>> it more visible in which way the project is currently going.
>>>>
>>>> Users and developers can track the direction by following the
>>>> discussion threads and JIRA, but due to the mass of discussions and open
>>>> issues, it is very hard to get a good overall picture.
>>>> Especially for new users and contributors, is is very hard to get a
>>>> quick overview of the project direction.
>>>>
>>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>>> the benefit for users justifies that.
>>>> The Apache Beam project has added such a roadmap [1]
>>>> <https://beam.apache.org/roadmap/>, which was received very well by
>>>> the community, I would suggest to follow a similar structure here.
>>>>
>>>> If the community is in favor of this, I would volunteer to write a
>>>> first version of such a roadmap. The points I would include are below.
>>>>
>>>> Best,
>>>> Stephan
>>>>
>>>> [1] https://beam.apache.org/roadmap/
>>>>
>>>> 
>>>>
>>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>>> entity, but by its community and Project Management Committee (PMC). This
>>>> is not a authoritative roadmap in the sense of a plan with a specific
>>>> timeline. Instead, we share

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-13 Thread jincheng sun
Very excited and thank you for launching such a great discussion, Stephan !

Here only a little suggestion that in the Batch Streaming Unification
section, do we need to add an item:

- Same window operators on bounded/unbounded Table API and DataStream API
(currently OVER window only exists in SQL/TableAPI, DataStream API does not
yet support)

Best,
Jincheng

Stephan Ewen  于2019年2月13日周三 下午7:21写道:

> Hi all!
>
> Recently several contributors, committers, and users asked about making it
> more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the discussion
> threads and JIRA, but due to the mass of discussions and open issues, it is
> very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a quick
> overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage. It
> is a bit of a commitment to keep that roadmap up to date, but I think the
> benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> , which was received very well by the
> community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a first
> version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> 
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look forward to.
>
> *Future Role of Table API and DataStream API*
>   - Table API becomes first class citizen
>   - Table API becomes primary API for analytics use cases
>   * Declarative, automatic optimizations
>   * No manual control over state and timers
>   - DataStream API becomes primary API for applications and data pipeline
> use cases
>   * Physical, user controls data types, no magic or optimizer
>   * Explicit control over state and time
>
> *Batch Streaming Unification*
>   - Table API unification (environments) (FLIP-32)
>   - New unified source interface (FLIP-27)
>   - Runtime operator unification & code reuse between DataStream / Table
>   - Extending Table API to make it convenient API for all analytical use
> cases (easier mix in of UDFs)
>   - Same join operators on bounded/unbounded Table API and DataStream API
>
> *Faster Batch (Bounded Streams)*
>   - Much of this comes via Blink contribution/merging
>   - Fine-grained Fault Tolerance on bounded data (Table API)
>   - Batch Scheduling on bounded data (Table API)
>   - External Shuffle Services Support on bounded streams
>   - Caching of intermediate results on bounded data (Table API)
>   - Extending DataStream API to explicitly model bounded streams (API
> breaking)
>   - Add fine fault tolerance, scheduling, caching also to DataStream API
>
> *Streaming State Evolution*
>   - Let all built-in serializers support stable evolution
>   - First class support for other evolvable formats (Protobuf, Thrift)
>   - Savepoint input/output format to modify / adjust savepoints
>
> *Simpler Event Time Handling*
>   - Event Time Alignment in Sources
>   - Simpler out-of-the box support in sources
>
> *Checkpointing*
>   - Consistency of Side Effects: suspend / end with savepoint (FLIP-34)
>   - Failed checkpoints explicitly aborted on TaskManagers (not only on
> coordinator)
>
> *Automatic scaling (adjusting parallelism)*
>   - Reactive scaling
>   - Active scaling policies
>
> *Kubernetes Integration*
>   - Active Kubernetes Integration (Flink actively manages containers)
>
> *SQL Ecosystem*
>   - Extended Metadata Stores / Catalog / Schema Registries support
>   - DDL support
>   - Integration with Hive Ecosystem
>
> *Simpler Handling of Dependencies*
>   - Scala in the APIs, but not in the core (hide in separate class loader)
>   - Hadoop-free by default
>
>


Re: [ANNOUNCE] New Flink PMC member Thomas Weise

2019-02-12 Thread jincheng sun
Congrats Thomas !

Cheers,
Jincheng

Fabian Hueske  于2019年2月12日周二 下午5:59写道:

> Hi everyone,
>
> On behalf of the Flink PMC I am happy to announce Thomas Weise as a new
> member of the Apache Flink PMC.
>
> Thomas is a long time contributor and member of our community.
> He is starting and participating in lots of discussions on our mailing
> lists, working on topics that are of joint interest of Flink and Beam, and
> giving talks on Flink at many events.
>
> Please join me in welcoming and congratulating Thomas!
>
> Best,
> Fabian
>


Re: [DISCUSS] Towards a leaner flink-dist

2019-01-24 Thread jincheng sun
Hi Chesnay,

Thank you for the proposal. And i like it very much.

+1 for the leaner distribution.

About improve the "Download" page, I think we can add the connectors
download link in the  "Optional components" section which @Timo Walther
  mentioned above.


Regards,
Jincheng

Chesnay Schepler  于2019年1月18日周五 下午5:59写道:

> Hello,
>
> the binary distribution that we release by now contains quite a lot of
> optional components, including various filesystems, metric reporters and
> libraries. Most users will only use a fraction of these, and as such
> pretty much only increase the size of flink-dist.
>
> With Flink growing more and more in scope I don't believe it to be
> feasible to ship everything we have with every distribution, and instead
> suggest more of a "pick-what-you-need" model, where flink-dist is rather
> lean and additional components are downloaded separately and added by
> the user.
>
> This would primarily affect the /opt directory, but could also be
> extended to cover flink-dist. For example, the yarn and mesos code could
> be spliced out into separate jars that could be added to lib manually.
>
> Let me know what you think.
>
> Regards,
>
> Chesnay
>
>


Re: [ANNOUNCE] Apache Flink 1.5.6 released

2018-12-26 Thread jincheng sun
Thanks a lot for being our release manager Thomas.
Thanks a lot for made this release possible!

Cheers,
Jincheng

Thomas Weise  于2018年12月27日周四 上午4:03写道:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.5.6, which is the final bugfix release for the Apache Flink 1.5
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2018/12/22/release-1.5.6.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344315
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Thomas
>


Re: [ANNOUNCE] Apache Flink 1.7.1 released

2018-12-24 Thread jincheng sun
Thanks for being the release manager Chesnay!
Thanks a lot for made this release possible!

Thanks,
Jincheng

Chesnay Schepler  于2018年12月23日周日 上午3:34写道:

> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.7.1, which is the first bugfix release for the Apache
> Flink 1.7 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data
> streaming applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> https://flink.apache.org/news/2018/12/21/release-1.7.1.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344412
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Chesnay
>


Re: [ANNOUNCE] Apache Flink 1.6.3 released

2018-12-24 Thread jincheng sun
Thanks a lot for being our release manager Gordon.
Thanks a lot for made this release possible!

Cheers,
Jincheng

Tzu-Li (Gordon) Tai  于2018年12月23日周日 下午9:35写道:

> Hi,
>
> The Apache Flink community is very happy to announce the release of
> Apache Flink 1.6.3, which is the third bugfix release for the Apache
> Flink 1.6 series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data
> streaming applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements for this bugfix release:
> https://flink.apache.org/news/2018/12/22/release-1.6.3.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344314
>
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
>
> Regards,
> Gordon
>
>


Re: delay one of the datastream when performing join operation on event-time and watermark

2018-12-06 Thread jincheng sun
Hi Pakesh Kuma,
I think you can using the interval-join, e.g.:

orderStream
.keyBy()
.intervalJoin(invoiceStream.keyBy())
.between(Time.minutes(-5), Time.minutes(5))

The semantics of interval-join and detailed usage description can refer to
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/joining.html#interval-join

Hope to help you, and any feedback is welcome!

Bests,
Jincheng


Rakesh Kumar  于2018年12月6日周四 下午7:10写道:

> Hi,
> I have two data sources one is  for order data and another one is for
> invoice data, these two data i am pushing into kafka topic in json form. I
> wanted to delay order data for 5 mins because invoice data comes only after
> order data is generated. So, for that i have written a flink program which
> will take these two data from kafka and apply watermarks and delay order
> data for 5 mins. After applying watermarks on these data, i wanted to join
> these data based on order_id which is present in both order and invoice
> data. After Joining i wanted to push it to kafka in different topic.
>
> But, i am not able to join these data streams with 5 min delay and i am
> not able to figure it out.
>
> I am attaching my flink program below and it's dependency.
>


Re: MergingWindow

2017-12-29 Thread jincheng sun
Hi  aitozi,

`MergingWindowSet` is a Utility, used for keeping track of merging Windows
when using a MergingWindowAssigner in a WindowOperator.

In flink  `MergingWindowAssigner`  only used for SessionWindow. The
implementations of  `MergingWindowAssigner` are `EventTimeSessionWindows`
and `ProcessingTimeSessionWindows`. As we know Session window depends on
the element of the time gap to split the window, if you encounter
out-of-order elements, there is a merge window situation.


In the `processElement` method of WindowOperator, when adding the new
window might result in a merge. the merge logic is in the `addWindow`
method of `MergingWindowSet`. The snippet you mentioned is in that method.
To understand the code snippet above, we must understand the collation
logic of the merge window。 Let me cite a merge example to illustrate the
merging logic, if we have two windows WinA, WinB, when WinC is added, we
find WinA, WinB, WinC should be merged. So, in this time WinC is new Window
WinA and WinB are in `MergingWindowSet.mapping`, the `mapping` is Map, Mapping from window to the window that keeps the window state. When we
are incrementally merging windows starting from some window we keep that
starting window as the state window to prevent costly state juggling. As
shown below:


​
Now we  know the logic of merge window, and we talk about the logic of you
mentioned above:

> if (!(mergedWindows.contains(mergeResult) && mergedWindows.size() == 1)) {
> mergeFunction.merge(mergeResult,
> mergedWindows,
>
> this.mapping.get(mergeResult),
> mergedStateWindows);
> }


This code is guarantee that don't merge the new window itself, it never had
any state associated with it i.e. if we are only merging one pre-existing
window into itself without extending the pre-exising window.

I am not sure if the explanation is clear, but I hope to be helpful to you.
:)
And welcome anybody feedback... :)

Best, Jincheng

2017-12-27 16:58 GMT+08:00 Ufuk Celebi :

> Please check your email before sending it the next time as three
> emails for the same message is a little spammy ;-)
>
> This is internal code that is used to implement session windows as far
> as I can tell. The idea is to not merge the new window as it never had
> any state associated with it. The general idea of merging windows is
> to keep one of the original windows as the state window, i.e. the
> window that is used as namespace to store the window elements.
> Elements from the state windows of merged windows must be merged into
> this one state window.
>
> For more details, this should be directed to the dev mailing list.
>
> – Ufuk
>
> On Tue, Dec 26, 2017 at 4:58 AM, aitozi  wrote:
> > Hi,
> >
> > i cant unserstand usage of this snippest of the code in
> > MergingWindowSet.java, can anyone explain this for me ?
> >
> >
> > if (!(mergedWindows.contains(mergeResult) && mergedWindows.size() ==
> 1)) {
> > mergeFunction.merge(mergeResult,
> > mergedWindows,
> >
>  this.mapping.get(mergeResult),
> > mergedStateWindows);
> > }
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/
>