from:"\"Dian Fu\""

Re: CiBot Update

2019-08-22 Thread Dian Fu

Thanks Chesnay for your great work! A very useful feature! 

Just one minor suggestion: It will be better if we could add this command to 
the section "Bot commands" in the flinkbot template.

Regards,
Dian

> 在 2019年8月23日，上午2:06，Ethan Li  写道：
> 
> My question is specifically about implementation of "@flinkbot run travis"
> 
>> On Aug 22, 2019, at 1:06 PM, Ethan Li  wrote:
>> 
>> Hi Chesnay,
>> 
>> This is really nice feature!
>> 
>> Can I ask how is this implemented? Do you have the related Jira/PR/docs that 
>> I can take a look? I’d like to introduce it to another project if 
>> applicable. Thank you very much!
>> 
>> Best,
>> Ethan
>> 
>>> On Aug 22, 2019, at 8:34 AM, Biao Liu >> > wrote:
>>> 
>>> Thanks Chesnay a lot,
>>> 
>>> I love this feature!
>>> 
>>> Thanks,
>>> Biao /'bɪ.aʊ/
>>> 
>>> 
>>> 
>>> On Thu, 22 Aug 2019 at 20:55, Hequn Cheng >> > wrote:
>>> 
 Cool, thanks Chesnay a lot for the improvement!
 
 Best, Hequn
 
 On Thu, Aug 22, 2019 at 5:02 PM Zhu Zhu >>> > wrote:
 
> Thanks Chesnay for the CI improvement!
> It is very helpful.
> 
> Thanks,
> Zhu Zhu
> 
> zhijiang  > 于2019年8月22日周四 下午4:18写道：
> 
>> It is really very convenient now. Valuable work, Chesnay!
>> 
>> Best,
>> Zhijiang
>> --
>> From:Till Rohrmann mailto:trohrm...@apache.org>>
>> Send Time:2019年8月22日(星期四) 10:13
>> To:dev mailto:dev@flink.apache.org>>
>> Subject:Re: CiBot Update
>> 
>> Thanks for the continuous work on the CiBot Chesnay!
>> 
>> Cheers,
>> Till
>> 
>> On Thu, Aug 22, 2019 at 9:47 AM Jark Wu > > wrote:
>> 
>>> Great work! Thanks Chesnay!
>>> 
>>> 
>>> 
>>> On Thu, 22 Aug 2019 at 15:42, Xintong Song >> >
>> wrote:
>>> 
 The re-triggering travis feature is so convenient. Thanks Chesnay~!
 
 Thank you~
 
 Xintong Song
 
 
 
 On Thu, Aug 22, 2019 at 9:26 AM Stephan Ewen >>> >
> wrote:
 
> Nice, thanks!
> 
> On Thu, Aug 22, 2019 at 3:59 AM Zili Chen  >
>>> wrote:
> 
>> Thanks for your announcement. Nice work!
>> 
>> Best,
>> tison.
>> 
>> 
>> vino yang mailto:yanghua1...@gmail.com>> 
>> 于2019年8月22日周四 上午8:14写道：
>> 
>>> +1 for "@flinkbot run travis", it is very convenient.
>>> 
>>> Chesnay Schepler mailto:ches...@apache.org>> 
>>> 于2019年8月21日周三
 下午9:12写道：
>>> 
 Hi everyone,
 
 this is an update on recent changes to the CI bot.
 
 
 The bot now cancels builds if a new commit was added to a
 PR,
>> and
 cancels all builds if the PR was closed.
 (This was implemented a while ago; I'm just mentioning it
> again
>>> for
 discoverability)
 
 
 Additionally, starting today you can now re-trigger a
 Travis
>> run
>>> by
 writing a comment "@flinkbot run travis"; this means you no
>>> longer
> have
 to commit an empty commit or do other shenanigans to get
>> another
> build
 running.
 Note that this will /not/ work if the PR was re-opened,
 until
>> at
> least
>> 1
 new build was triggered by a push.
 
>>> 
>> 
> 
 
>>> 
>> 
>> 
> 
 
>> 
>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-22 Thread Dian Fu

Hi Jincheng,

+1 to start the FLIP create and VOTE on this feature. I'm willing to help on 
the FLIP create if you don't mind. As I haven't created a FLIP before, it will 
be great if you could help on this. :)

Regards,
Dian

> 在 2019年8月22日，下午11:41，jincheng sun  写道：
> 
> Hi all,
> 
> Thanks a lot for your feedback. If there are no more suggestions and
> comments, I think it's better to  initiate a vote to create a FLIP for
> Apache Flink Python UDFs.
> What do you think?
> 
> Best, Jincheng
> 
> jincheng sun  于2019年8月15日周四 上午12:54写道：
> 
>> Hi Thomas,
>> 
>> Thanks for your confirmation and the very important reminder about bundle
>> processing.
>> 
>> I have had add the description about how to perform bundle processing from
>> the perspective of checkpoint and watermark. Feel free to leave comments if
>> there are anything not describe clearly.
>> 
>> Best,
>> Jincheng
>> 
>> 
>> Dian Fu  于2019年8月14日周三 上午10:08写道：
>> 
>>> Hi Thomas,
>>> 
>>> Thanks a lot the suggestions.
>>> 
>>> Regarding to bundle processing, there is a section "Checkpoint"[1] in the
>>> design doc which talks about how to handle the checkpoint.
>>> However, I think you are right that we should talk more about it, such as
>>> what's bundle processing, how it affects the checkpoint and watermark, how
>>> to handle the checkpoint and watermark, etc.
>>> 
>>> [1]
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>> <
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>> 
>>> 
>>> Regards,
>>> Dian
>>> 
>>>> 在 2019年8月14日，上午1:01，Thomas Weise  写道：
>>>> 
>>>> Hi Jincheng,
>>>> 
>>>> Thanks for putting this together. The proposal is very detailed,
>>> thorough
>>>> and for me as a Beam Flink runner contributor easy to understand :)
>>>> 
>>>> One thing that you should probably detail more is the bundle
>>> processing. It
>>>> is critically important for performance that multiple elements are
>>>> processed in a bundle. The default bundle size in the Flink runner is
>>> 1s or
>>>> 1000 elements, whichever comes first. And for streaming, you can find
>>> the
>>>> logic necessary to align the bundle processing with watermarks and
>>>> checkpointing here:
>>>> 
>>> https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
>>>> 
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Tue, Aug 13, 2019 at 7:05 AM jincheng sun 
>>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> The Python Table API(without Python UDF support) has already been
>>> supported
>>>>> and will be available in the coming release 1.9.
>>>>> As Python UDF is very important for Python users, we'd like to start
>>> the
>>>>> discussion about the Python UDF support in the Python Table API.
>>>>> Aljoscha Krettek, Dian Fu and I have discussed offline and have
>>> drafted a
>>>>> design doc[1]. It includes the following items:
>>>>> 
>>>>> - The user-defined function interfaces.
>>>>> - The user-defined function execution architecture.
>>>>> 
>>>>> As mentioned by many guys in the previous discussion thread[2], a
>>>>> portability framework was introduced in Apache Beam in latest
>>> releases. It
>>>>> provides well-defined, language-neutral data structures and protocols
>>> for
>>>>> language-neutral user-defined function execution. This design is based
>>> on
>>>>> Beam's portability framework. We will introduce how to make use of
>>> Beam's
>>>>> portability framework for user-defined function execution: data
>>>>> transmission, state access, checkpoint, metrics, logging, etc.
>>>>> 
>>>>> Considering that the design relies on Beam's portability framework for
>>>>> Python user-defined function execution and not all the contributors in
>>>>> Flink community are familiar with Beam's portability framework, we have
>>>>> done a prototype[3] for proof of concept and also ease of
>>> understanding of
>>>>> the design.
>>>>> 
>>>>> Welcome any feedback.
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> [1]
>>>>> 
>>>>> 
>>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
>>>>> [2]
>>>>> 
>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
>>>>> [3] https://github.com/dianfu/flink/commits/udf_poc
>>>>> 
>>> 
>>>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Dian Fu

Great news! Thanks Gordon and Kurt for pushing this forward and everybody who 
contributed to this release.

Regards,
Dian

> 在 2019年8月23日，上午9:41，Guowei Ma  写道：
> 
> Congratulations!!
> Best,
> Guowei
> 
> 
> Congxian Qiu mailto:qcx978132...@gmail.com>> 
> 于2019年8月23日周五 上午9:32写道：
> Congratulations, and thanks for everyone who make this release possible.
> Best,
> Congxian
> 
> 
> Kurt Young mailto:ykt...@gmail.com>> 于2019年8月23日周五 
> 上午8:13写道：
> Great to hear! Thanks Gordon for driving the release,  and it's been a great 
> pleasure to work with you as release managers for the last couple of weeks. 
> And thanks everyone who contributed to this version, you're making Flink an 
> even better project!
> 
> Best,
> Kurt 
> 
> Yun Tang mailto:myas...@live.com>>于2019年8月23日 周五02:17写道：
> Glad to hear this and really appreciate Gordon and Kurt's drive on this 
> release, and thanks for everyone who ever contributed to this release.
> 
> Best
> Yun Tang
> From: Becket Qin mailto:becket@gmail.com>>
> Sent: Friday, August 23, 2019 0:19
> To: 不常用邮箱 mailto:xu_soft39211...@163.com>>
> Cc: Yang Wang mailto:danrtsey...@gmail.com>>; user 
> mailto:u...@flink.apache.org>>
> Subject: Re: [ANNOUNCE] Apache Flink 1.9.0 released
>  
> Cheers!! Thanks Gordon and Kurt for driving the release!
> 
> On Thu, Aug 22, 2019 at 5:36 PM 不常用邮箱  > wrote:
> Good news!
> 
> Best.
> -- 
> Louis
> Email： xu_soft39211...@163.com 
> 
>> On Aug 22, 2019, at 22:10, Yang Wang > > wrote:
>> 
>> Glad to hear that.
>> Thanks Gordon, Kurt and everyone who had made contributions to the great 
>> version.
>> 
>> 
>> Best,
>> Yang
>> 
>> 
>> Biao Liu mailto:mmyy1...@gmail.com>> 于2019年8月22日周四 
>> 下午9:33写道：
>> Great news!
>> 
>> Thank your Gordon & Kurt for being the release managers!
>> Thanks all contributors worked on this release!
>> 
>> Thanks,
>> Biao /'bɪ.aʊ/
>> 
>> 
>> 
>> On Thu, 22 Aug 2019 at 21:14, Paul Lam > > wrote:
>> Well done! Thanks to everyone who contributed to the release!
>> 
>> Best,
>> Paul Lam
>> 
>> Yu Li mailto:car...@gmail.com>> 于2019年8月22日周四 下午9:03写道：
>> Thanks for the update Gordon, and congratulations!
>> 
>> Great thanks to all for making this release possible, especially to our 
>> release managers!
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Thu, 22 Aug 2019 at 14:55, Xintong Song > > wrote:
>> Congratulations!
>> Thanks Gordon and Kurt for being the release managers, and thanks all the 
>> contributors.
>> 
>> Thank you~
>> Xintong Song
>> 
>> 
>> On Thu, Aug 22, 2019 at 2:39 PM Yun Gao > > wrote:
>>  Congratulations ! 
>> 
>>  Very thanks for Gordon and Kurt for managing the release and very 
>> thanks for everyone for the contributions !
>> 
>>   Best, 
>>   Yun 
>> 
>> 
>> 
>> --
>> From:Zhu Zhu mailto:reed...@gmail.com>>
>> Send Time:2019 Aug. 22 (Thu.) 20:18
>> To:Eliza mailto:e...@chinabuckets.com>>
>> Cc:user mailto:u...@flink.apache.org>>
>> Subject:Re: [ANNOUNCE] Apache Flink 1.9.0 released
>> 
>> Thanks Gordon for the update.
>> Congratulations that we have Flink 1.9.0 released!
>> Thanks to all the contributors.
>> 
>> Thanks,
>> Zhu Zhu
>> 
>> 
>> Eliza mailto:e...@chinabuckets.com>> 于2019年8月22日周四 
>> 下午8:10写道：
>> 
>> 
>> On 2019/8/22 星期四 下午 8:03, Tzu-Li (Gordon) Tai wrote:
>> > The Apache Flink community is very happy to announce the release of 
>> > Apache Flink 1.9.0, which is the latest major release.
>> 
>> Congratulations and thanks~
>> 
>> regards.
>> 
> 
> -- 
> Best,
> Kurt

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-25 Thread Dian Fu

Hi Jincheng,

Appreciated for the kind tips and offering of help. Definitely need it! Could 
you grant me write permission for confluence? My Id: Dian Fu

Thanks,
Dian

> 在 2019年8月26日，上午9:53，jincheng sun  写道：
> 
> Thanks for your feedback Hequn & Dian.
> 
> Dian, I am glad to see that you want help to create the FLIP!
> Everyone will have first time, and I am very willing to help you complete
> your first FLIP creation. Here some tips:
> 
> - First I'll give your account write permission for confluence.
> - Before create the FLIP, please have look at the FLIP Template [1], (It's
> better to know more about FLIP by reading [2])
> - Create Flink Python UDFs related JIRAs after completing the VOTE of
> FLIP.(I think you also can bring up the VOTE thread, if you want! )
> 
> Any problems you encounter during this period，feel free to tell me that we
> can solve them together. :)
> 
> Best,
> Jincheng
> 
> 
> 
> 
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> 
> 
> Hequn Cheng  于2019年8月23日周五 上午11:54写道：
> 
>> +1 for starting the vote.
>> 
>> Thanks Jincheng a lot for the discussion.
>> 
>> Best, Hequn
>> 
>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu  wrote:
>> 
>>> Hi Jincheng,
>>> 
>>> +1 to start the FLIP create and VOTE on this feature. I'm willing to help
>>> on the FLIP create if you don't mind. As I haven't created a FLIP before,
>>> it will be great if you could help on this. :)
>>> 
>>> Regards,
>>> Dian
>>> 
>>>> 在 2019年8月22日，下午11:41，jincheng sun  写道：
>>>> 
>>>> Hi all,
>>>> 
>>>> Thanks a lot for your feedback. If there are no more suggestions and
>>>> comments, I think it's better to  initiate a vote to create a FLIP for
>>>> Apache Flink Python UDFs.
>>>> What do you think?
>>>> 
>>>> Best, Jincheng
>>>> 
>>>> jincheng sun  于2019年8月15日周四 上午12:54写道：
>>>> 
>>>>> Hi Thomas,
>>>>> 
>>>>> Thanks for your confirmation and the very important reminder about
>>> bundle
>>>>> processing.
>>>>> 
>>>>> I have had add the description about how to perform bundle processing
>>> from
>>>>> the perspective of checkpoint and watermark. Feel free to leave
>>> comments if
>>>>> there are anything not describe clearly.
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> 
>>>>> Dian Fu  于2019年8月14日周三 上午10:08写道：
>>>>> 
>>>>>> Hi Thomas,
>>>>>> 
>>>>>> Thanks a lot the suggestions.
>>>>>> 
>>>>>> Regarding to bundle processing, there is a section "Checkpoint"[1] in
>>> the
>>>>>> design doc which talks about how to handle the checkpoint.
>>>>>> However, I think you are right that we should talk more about it,
>> such
>>> as
>>>>>> what's bundle processing, how it affects the checkpoint and
>> watermark,
>>> how
>>>>>> to handle the checkpoint and watermark, etc.
>>>>>> 
>>>>>> [1]
>>>>>> 
>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>> <
>>>>>> 
>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
>>>>>>> 
>>>>>> 
>>>>>> Regards,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年8月14日，上午1:01，Thomas Weise  写道：
>>>>>>> 
>>>>>>> Hi Jincheng,
>>>>>>> 
>>>>>>> Thanks for putting this together. The proposal is very detailed,
>>>>>> thorough
>>>>>>> and for me as a Beam Flink runner contributor easy to understand :)
>>>>>>> 
>>>>>>> One thing that you should probably detail more is the bundle
>>>>>> processing. It
>>>>>>> is critically important for performance that multiple elements are
>>>>>>> processed in a bundle. The default bundle size in the Flink runner
>> is
>>>>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-26 Thread Dian Fu

Hi Jincheng,

Thanks! It works.

Thanks,
Dian

> 在 2019年8月27日，上午10:55，jincheng sun  写道：
> 
> Hi Dian, can you check if you have edit access? :)
> 
> 
> Dian Fu  于2019年8月26日周一 上午10:52写道：
> 
>> Hi Jincheng,
>> 
>> Appreciated for the kind tips and offering of help. Definitely need it!
>> Could you grant me write permission for confluence? My Id: Dian Fu
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年8月26日，上午9:53，jincheng sun  写道：
>>> 
>>> Thanks for your feedback Hequn & Dian.
>>> 
>>> Dian, I am glad to see that you want help to create the FLIP!
>>> Everyone will have first time, and I am very willing to help you complete
>>> your first FLIP creation. Here some tips:
>>> 
>>> - First I'll give your account write permission for confluence.
>>> - Before create the FLIP, please have look at the FLIP Template [1],
>> (It's
>>> better to know more about FLIP by reading [2])
>>> - Create Flink Python UDFs related JIRAs after completing the VOTE of
>>> FLIP.(I think you also can bring up the VOTE thread, if you want! )
>>> 
>>> Any problems you encounter during this period，feel free to tell me that
>> we
>>> can solve them together. :)
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> 
>>> 
>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>>> [2]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>> 
>>> 
>>> Hequn Cheng  于2019年8月23日周五 上午11:54写道：
>>> 
>>>> +1 for starting the vote.
>>>> 
>>>> Thanks Jincheng a lot for the discussion.
>>>> 
>>>> Best, Hequn
>>>> 
>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu  wrote:
>>>> 
>>>>> Hi Jincheng,
>>>>> 
>>>>> +1 to start the FLIP create and VOTE on this feature. I'm willing to
>> help
>>>>> on the FLIP create if you don't mind. As I haven't created a FLIP
>> before,
>>>>> it will be great if you could help on this. :)
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>>> 在 2019年8月22日，下午11:41，jincheng sun  写道：
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> Thanks a lot for your feedback. If there are no more suggestions and
>>>>>> comments, I think it's better to  initiate a vote to create a FLIP for
>>>>>> Apache Flink Python UDFs.
>>>>>> What do you think?
>>>>>> 
>>>>>> Best, Jincheng
>>>>>> 
>>>>>> jincheng sun  于2019年8月15日周四 上午12:54写道：
>>>>>> 
>>>>>>> Hi Thomas,
>>>>>>> 
>>>>>>> Thanks for your confirmation and the very important reminder about
>>>>> bundle
>>>>>>> processing.
>>>>>>> 
>>>>>>> I have had add the description about how to perform bundle processing
>>>>> from
>>>>>>> the perspective of checkpoint and watermark. Feel free to leave
>>>>> comments if
>>>>>>> there are anything not describe clearly.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jincheng
>>>>>>> 
>>>>>>> 
>>>>>>> Dian Fu  于2019年8月14日周三 上午10:08写道：
>>>>>>> 
>>>>>>>> Hi Thomas,
>>>>>>>> 
>>>>>>>> Thanks a lot the suggestions.
>>>>>>>> 
>>>>>>>> Regarding to bundle processing, there is a section "Checkpoint"[1]
>> in
>>>>> the
>>>>>>>> design doc which talks about how to handle the checkpoint.
>>>>>>>> However, I think you are right that we should talk more about it,
>>>> such
>>>>> as
>>>>>>>> what's bundle processing, how it affects the checkpoint and
>>>> watermark,
>>>>> how
>>>>>>>> to handle the checkpoint and watermark, etc.
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h

[VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-08-27 Thread Dian Fu

Hi all,

I'd like to start a voting thread for FLIP-58 [1] since that we have reached an 
agreement on the design in the discussion thread [2],

This vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Sept 2, 2019 00:00 UTC if we have received sufficient 
votes.

PS: This doesn't mean that we cannot further improve the design. We can still 
discuss the implementation details case by case in the JIRA as long as it 
doesn't affect the overall design.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
 

[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
 


Thanks,
Dian

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-08-27 Thread Dian Fu

Hi all,

I have started a voting thread [1]. Thanks a lot for your help during creating 
the FLIP @Jincheng.


Hi Bowen,

Very appreciated for your comments. I have replied you in the design doc. As it 
seems that the comments doesn't affect the overall design, I'll not cancel the 
vote for now and we can continue the discussion in the design doc. 

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
 
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html>

Regards,
Dian

> 在 2019年8月28日，上午11:05，Bowen Li  写道：
> 
> Hi Jincheng and Dian,
> 
> Sorry for being late to the party. I took a glance at the proposal, LGTM in
> general, and I left only a couple comments.
> 
> Thanks,
> Bowen
> 
> 
> On Mon, Aug 26, 2019 at 8:05 PM Dian Fu  wrote:
> 
>> Hi Jincheng,
>> 
>> Thanks! It works.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年8月27日，上午10:55，jincheng sun  写道：
>>> 
>>> Hi Dian, can you check if you have edit access? :)
>>> 
>>> 
>>> Dian Fu  于2019年8月26日周一 上午10:52写道：
>>> 
>>>> Hi Jincheng,
>>>> 
>>>> Appreciated for the kind tips and offering of help. Definitely need it!
>>>> Could you grant me write permission for confluence? My Id: Dian Fu
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年8月26日，上午9:53，jincheng sun  写道：
>>>>> 
>>>>> Thanks for your feedback Hequn & Dian.
>>>>> 
>>>>> Dian, I am glad to see that you want help to create the FLIP!
>>>>> Everyone will have first time, and I am very willing to help you
>> complete
>>>>> your first FLIP creation. Here some tips:
>>>>> 
>>>>> - First I'll give your account write permission for confluence.
>>>>> - Before create the FLIP, please have look at the FLIP Template [1],
>>>> (It's
>>>>> better to know more about FLIP by reading [2])
>>>>> - Create Flink Python UDFs related JIRAs after completing the VOTE of
>>>>> FLIP.(I think you also can bring up the VOTE thread, if you want! )
>>>>> 
>>>>> Any problems you encounter during this period，feel free to tell me that
>>>> we
>>>>> can solve them together. :)
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>>>>> [2]
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>>>> 
>>>>> 
>>>>> Hequn Cheng  于2019年8月23日周五 上午11:54写道：
>>>>> 
>>>>>> +1 for starting the vote.
>>>>>> 
>>>>>> Thanks Jincheng a lot for the discussion.
>>>>>> 
>>>>>> Best, Hequn
>>>>>> 
>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu 
>> wrote:
>>>>>> 
>>>>>>> Hi Jincheng,
>>>>>>> 
>>>>>>> +1 to start the FLIP create and VOTE on this feature. I'm willing to
>>>> help
>>>>>>> on the FLIP create if you don't mind. As I haven't created a FLIP
>>>> before,
>>>>>>> it will be great if you could help on this. :)
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Dian
>>>>>>> 
>>>>>>>> 在 2019年8月22日，下午11:41，jincheng sun  写道：
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> Thanks a lot for your feedback. If there are no more suggestions and
>>>>>>>> comments, I think it's better to  initiate a vote to create a FLIP
>> for
>>>>>>>> Apache Flink Python UDFs.
>>>>>>>> What do you think?
>>>>>>>> 
>>>>>>>> Best, Jincheng
>>>>>>>> 
>>>>>>>> jincheng sun  于2019年8月15日周四 上午12:54写道：
>>>>>>>> 
>>>>>>>>> Hi Thomas,
>>>>>>>>> 
>>>>>>>>> Thanks for your confir

Re: [DISCUSS] Releasing Flink 1.8.2

2019-08-30 Thread Dian Fu

Hi Jincheng,

Thanks a lot for bring up this discussion. +1 for this release.

Regards,
Dian

> 在 2019年8月30日，下午6:31，Maximilian Michels  写道：
> 
> Hi Jincheng,
> 
> +1 I would be for a 1.8.2 release such that we can fix the problems with the 
> nested closure cleaner which currently block 1.8.1 users with Beam: 
> https://issues.apache.org/jira/browse/FLINK-13367
> 
> Thanks,
> Max
> 
> On 30.08.19 11:25, jincheng sun wrote:
>> Hi Flink devs,
>> It has been nearly 2 months since the 1.8.1 released. So, what do you think
>> about releasing Flink 1.8.2 soon?
>> We already have some blocker and critical fixes in the release-1.8 branch:
>> [Blocker]
>> - FLINK-13159 java.lang.ClassNotFoundException when restore job
>> - FLINK-10368 'Kerberized YARN on Docker test' unstable
>> - FLINK-12578 Use secure URLs for Maven repositories
>> [Critical]
>> - FLINK-12736 ResourceManager may release TM with allocated slots
>> - FLINK-12889 Job keeps in FAILING state
>> - FLINK-13484 ConnectedComponents end-to-end test instable with
>> NoResourceAvailableException
>> - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt to sleep
>> with negative time
>> - FLINK-13806 Metric Fetcher floods the JM log with errors when TM is lost
>> Furthermore, I think the following one blocker issue should be merged
>> before 1.8.2 release.
>> - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
>> It would also be great if we can have the fix of Elasticsearch6.x connector
>> threads leaking (FLINK-13689) in 1.8.2 release which is identified as major.
>> Please let me know what you think?
>> Cheers,
>> Jincheng

Re: State of FLIPs

2019-08-30 Thread Dian Fu

Hi Chesnay,

Thanks a lot for the remind. FLIP-38 has been released in 1.9 and I have
updated the status in the wiki page.

Regards,
Dian

On Fri, Aug 30, 2019 at 9:38 PM Becket Qin  wrote:

> Hi Chesnay,
>
> You are right. FLIP-36 actually has not passed the vote yet. In fact some
> of the key designs may have to change due to the later code changes. I'll
> update the wiki and start a new vote.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Aug 30, 2019 at 8:44 PM Chesnay Schepler 
> wrote:
>
> > The following FLIPs are marked as "Under discussion" in the wiki
> > <
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >,
> > but actually seem to be in progress (i.e. have open pull requests) and
> some
> > even  have code merged to master:
> >
> >- FLIP-36 (Interactive Programming)
> >- FLIP-38 (Python Table API)
> >- FLIP-44 (Support Local Aggregation)
> >- FLIP-50 (Spill-able Heap Keyed State Backend)
> >
> > I would like to find out what the _actual_ state is, and then discuss how
> > we handle these FLIPs from now on (e.g., retcon history and mark them as
> > accepted, freeze further development until a vote, ...).
> >
> > I've cc'd all people who create the wiki pages for said FLIPs.
> >
> >
> >
>

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Dian Fu

Hi Jark,

Thanks for bringing up this discussion and the detailed design doc. This is 
definitely a critical feature for streaming SQL jobs. I have left a few 
comments in the design doc.

Thanks,
Dian

> 在 2019年9月6日，上午11:48，Forward Xu  写道：
> 
> Thanks Jark for this topic, This will be very useful.
> 
> 
> Best,
> 
> ForwardXu
> 
> Danny Chan  于2019年9月6日周五 上午11:26写道：
> 
>> Thanks Jark for bring up this topic, this is definitely an import feature
>> for the SQL, especially the DDL users.
>> 
>> I would spend some time to review this design doc, really thanks.
>> 
>> Best,
>> Danny Chan
>> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
>>> Hi everyone,
>>> 
>>> I would like to start discussion about how to support time attribute in
>> SQL
>>> DDL.
>>> In Flink 1.9, we already introduced a basic SQL DDL to create a table.
>>> However, it doesn't support to define time attributes. This makes users
>>> can't
>>> apply window operations on the tables created by DDL which is a bad
>>> experience.
>>> 
>>> In FLIP-66, we propose a syntax for watermark to define rowtime attribute
>>> and propose to use computed column syntax to define proctime attribute.
>>> But computed column is another big topic and should deserve a separate
>>> FLIP.
>>> If we have a consensus on the computed column approach, we will start
>>> computed column FLIP soon.
>>> 
>>> FLIP-66:
>>> 
>> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
>>> 
>>> Thanks for any feedback!
>>> 
>>> Best,
>>> Jark
>>

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-05 Thread Dian Fu

Hi Jingsong,

Thanks for bring up this discussion. You can try to look at the 
GlobalAggregateManager to see if it can meet your requirements. It can be got 
via StreamingRuntimeContext#getGlobalAggregateManager().

Regards,
Dian

> 在 2019年9月6日，下午1:39，shimin yang  写道：
> 
> Hi Jingsong,
> 
> Big fan of this idea. We faced the same problem and resolved by adding a
> distributed lock. It would be nice to have this feature in JobMaster, which
> can replace the lock.
> 
> Best,
> Shimin
> 
> JingsongLee  于2019年9月6日周五 下午12:20写道：
> 
>> Hi devs:
>> 
>> I try to implement streaming file sink for table[1] like StreamingFileSink.
>> If the underlying is a HiveFormat, or a format that updates visibility
>> through a metaStore, I have to update the metaStore in the
>> notifyCheckpointComplete, but this operation occurs on the task side,
>> which will lead to distributed access to the metaStore, which will
>> lead to bottleneck.
>> 
>> So I'm curious if we can support notifyOnMaster for
>> notifyCheckpointComplete like FinalizeOnMaster.
>> 
>> What do you think?
>> 
>> [1]
>> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>> 
>> Best,
>> Jingsong Lee

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-06 Thread Dian Fu

Hi Shimin,

It can be guaranteed to be an atomic operation. This is ensured by the RPC 
framework. You could take a look at RpcEndpoint for more details.

Regards,
Dian

> 在 2019年9月6日，下午2:35，shimin yang  写道：
> 
> Hi Fu,
> 
> Thank you for the remind. I think it would work in my case as long as it's
> an atomic operation.
> 
> Dian Fu  于2019年9月6日周五 下午2:22写道：
> 
>> Hi Jingsong,
>> 
>> Thanks for bring up this discussion. You can try to look at the
>> GlobalAggregateManager to see if it can meet your requirements. It can be
>> got via StreamingRuntimeContext#getGlobalAggregateManager().
>> 
>> Regards,
>> Dian
>> 
>>> 在 2019年9月6日，下午1:39，shimin yang  写道：
>>> 
>>> Hi Jingsong,
>>> 
>>> Big fan of this idea. We faced the same problem and resolved by adding a
>>> distributed lock. It would be nice to have this feature in JobMaster,
>> which
>>> can replace the lock.
>>> 
>>> Best,
>>> Shimin
>>> 
>>> JingsongLee  于2019年9月6日周五 下午12:20写道：
>>> 
>>>> Hi devs:
>>>> 
>>>> I try to implement streaming file sink for table[1] like
>> StreamingFileSink.
>>>> If the underlying is a HiveFormat, or a format that updates visibility
>>>> through a metaStore, I have to update the metaStore in the
>>>> notifyCheckpointComplete, but this operation occurs on the task side,
>>>> which will lead to distributed access to the metaStore, which will
>>>> lead to bottleneck.
>>>> 
>>>> So I'm curious if we can support notifyOnMaster for
>>>> notifyCheckpointComplete like FinalizeOnMaster.
>>>> 
>>>> What do you think?
>>>> 
>>>> [1]
>>>> 
>> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>>>> 
>>>> Best,
>>>> Jingsong Lee
>> 
>>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-06 Thread Dian Fu

;>>>> [1]
>>>>>> 
>> https://flink.apache.org/contributing/code-style-and-quality-java.html
>>>>>>>>> 
>>>>>>>>> On 02.09.19 15:35, jincheng sun wrote:
>>>>>>>>>> Hi Timo,
>>>>>>>>>> 
>>>>>>>>>> Great thanks for your feedback. I would like to share my thoughts
>>>> with
>>>>>>>>> you
>>>>>>>>>> inline. :)
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jincheng
>>>>>>>>>> 
>>>>>>>>>> Timo Walther  于2019年9月2日周一 下午5:04写道：
>>>>>>>>>> 
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> the FLIP looks awesome. However, I would like to discuss the
>>>> changes
>>>>>> to
>>>>>>>>>>> the user-facing parts again. Some feedback:
>>>>>>>>>>> 
>>>>>>>>>>> 1. DataViews: With the current non-annotation design for
>> DataViews,
>>>>>> we
>>>>>>>>>>> cannot perform eager state declaration, right? At which point
>>>> during
>>>>>>>>>>> execution do we know which state is required by the function? We
>>>>>> need to
>>>>>>>>>>> instantiate the function first, right?
>>>>>>>>>>> 
>>>>>>>>>>>> We will analysis the Python AggregateFunction and extract the
>>>>>> DataViews
>>>>>>>>>> used in the Python AggregateFunction. This can be done
>>>>>>>>>> by instantiate a Python AggregateFunction, creating an accumulator
>>>> by
>>>>>>>>>> calling method create_accumulator and then analysis the created
>>>>>>>>>> accumulator. This is actually similar to the way that Java
>>>>>>>>>> AggregateFunction processing codegen logic. The extracted
>> DataViews
>>>>>> can
>>>>>>>>>> then be used to construct the StateDescriptors in the operator,
>>>> i.e.,
>>>>>> we
>>>>>>>>>> should have hold the state spec and the state descriptor id in
>> Java
>>>>>>>>>> operator and Python worker can access the state by specifying the
>>>>>>>>>> corresponding state descriptor id.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 2. Serializability of functions: How do we ensure serializability
>>>> of
>>>>>>>>>>> functions for catalog persistence? In the Scala/Java API, we
>> would
>>>>>> like
>>>>>>>>>>> to register classes instead of instances soon. This is the only
>> way
>>>>>> to
>>>>>>>>>>> store a function properly in a catalog or we need some
>>>>>>>>>>> serialization/deserialization logic in the function interfaces to
>>>>>>>>>>> convert an instance to string properties.
>>>>>>>>>>> 
>>>>>>>>>>>> The Python function will be serialized with CloudPickle anyway
>> in
>>>>>> the
>>>>>>>>>> Python API as we need to transfer it to the Python worker which
>> can
>>>>>> then
>>>>>>>>>> deserialize it for execution. The serialized Python function can
>> be
>>>>>>>>> stored
>>>>>>>>>> into catalog.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 3. TableEnvironment: What is the signature of
>>>>>> `register_function(self,
>>>>>>>>>>> name, function)`? Does it accept both a class and function? Like
>>>>>> `class
>>>>>>>>>>> Sum` and `def split()`? Could you add some examples for
>> registering
>>>>>> both
>>>>>>>>>

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-06 Thread Dian Fu

Hi all,

I have updated the FLIP and removed content relate to UDAF and also changed the 
title of the FLIP to "Flink Python User-Defined Stateless Function for Table". 
Does it make sense to you? 

Regards,
Dian

> 在 2019年9月6日，下午6:09，Dian Fu  写道：
> 
> Hi all,
> 
> Thanks a lot for the discussion here. It makes sense to limit the scope of 
> this FLIP to only ScalarFunction. I'll update the FLIP and remove the content 
> relating to UDAF.
> 
> Thanks,
> Dian
> 
>> 在 2019年9月6日，下午6:02，jincheng sun  写道：
>> 
>> Hi，
>> 
>> Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and
>> FLIP-58 only do the stateless part.
>> 
>> Cheers,
>> Jincheng
>> 
>> Aljoscha Krettek  于2019年9月6日周五 下午5:53写道：
>> 
>>> Hi,
>>> 
>>> Regarding stateful functions and MapView/DataView/ListView: I think it’s
>>> best to keep that for a later FLIP and focus on a more basic version.
>>> Supporting stateful functions, especially with MapView can potentially be
>>> very slow so we have to see what we can do there.
>>> 
>>> For the method names, I don’t know. If FLIP-64 passes they have to be
>>> changed. So we could use the final names right away, but I’m also fine with
>>> using the old method names for now.
>>> 
>>> Best,
>>> Aljoscha
>>> 
>>>> On 5. Sep 2019, at 12:40, jincheng sun  wrote:
>>>> 
>>>> Hi Aljoscha,
>>>> 
>>>> Thanks for your comments!
>>>> 
>>>> Regarding to the FLIP scope, it seems that we have agreed on the design
>>> of
>>>> the stateless function support.
>>>> What do you think about starting the development of the stateless
>>> function
>>>> support firstly and continue the discussion of stateful function support?
>>>> Or you think we should split the current FLIP into two FLIPs and discuss
>>>> the stateful function support in another thread?
>>>> 
>>>> Currently, the Python DataView/MapView/ListView interfaces design follow
>>>> the Java/Scala naming conversions.
>>>> Of couse, We can continue to discuss whether there are better solutions,
>>>> i.e. using annotations.
>>>> 
>>>> Regarding to the magic logic to support DataView/MapView/ListView, it
>>> will
>>>> be done by the framework and is transparent for users.
>>>> Per my understanding, the magic logic is unavoidable no matter what the
>>>> interfaces will be.
>>>> 
>>>> Regarding to the catalog support of python function:1) If it's stored in
>>>> memory as temporary object, just as you said, users can call
>>>> TableEnvironment.register_function(will change to
>>>> register_temporary_function in FLIP-64)
>>>> 2) If it's persisted in external storage, users can call
>>>> Catalog.create_function. There will be no API change per my
>>> understanding.
>>>> 
>>>> What do you think?
>>>> Best,Jincheng
>>>> 
>>>> Aljoscha Krettek  于2019年9月5日周四 下午5:32写道：
>>>> 
>>>>> Hi,
>>>>> 
>>>>> Another thing to consider is the Scope of the FLIP. Currently, we try to
>>>>> support (stateful) AggregateFunctions. I have some concerns about
>>> whether
>>>>> or not DataView/MapView/ListView is a good interface because it requires
>>>>> quite some magic from the runners to make it work, such as messing with
>>> the
>>>>> TypeInformation and injecting objects at runtime. If the FLIP aims for
>>> the
>>>>> minimum of ScalarFunctions and the whole execution harness, that should
>>> be
>>>>> easier to agree on.
>>>>> 
>>>>> Another point is the naming of the new methods. I think Timo hinted at
>>> the
>>>>> fact that we have to consider catalog support for functions. There is
>>>>> ongoing work about differentiating between temporary objects and objects
>>>>> that are stored in a catalog (FLIP-64 [1]). With this in mind, the
>>> method
>>>>> for registering functions should be called register_temporary_function()
>>>>> and so on. Unless we want to already think about mixing Python and Java
>>>>> functions in the catalog, which is outside the scope of this FLIP, I
>>> think.
>>>>> 
>>>>> Best,
>>>>> Aljoscha
>>

Re: Checkpointing clarification

2019-09-06 Thread Dian Fu

When a WindowOperator receives all the barrier from the upstream, it will 
forward the barrier to downstream operator and perform the checkpoint 
asynchronously. 
It doesn't have to wait the window to trigger before sending out the barrier.

Regards,
Dian

> 在 2019年9月6日，下午8:02，Dominik Wosiński  写道：
> 
> Hello,
> I have a slight doubt on checkpointing in Flink and wanted to clarify my
> understanding. Flink uses barriers internally to keep track of the records
> that were processed. The documentation[1] describes it as the checkpoint
> was only happening when the barriers are transferred to the sink. So  let's
> consider a toy example of `TumblingEventTimeWindow` set to 5 hours and
> `CheckpointInterval` set to 10 minutes. So, if the documentation is
> correct, the checkpoint should occur only when the window is processed and
> gets to sink (which can take several hours) , which is not true as far as I
> know. I am surely wrong somewhere, could someone explain where is the error
> in my logic ?
> 
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Dian Fu

Congratulations Kostas!

Regards,
Dian

> 在 2019年9月6日，下午8:58，Wesley Peng  写道：
> 
> On 2019/9/6 8:55 下午, Fabian Hueske wrote:
>> I'm very happy to announce that Kostas Kloudas is joining the Flink PMC.
>> Kostas is contributing to Flink for many years and puts lots of effort in 
>> helping our users and growing the Flink community.
>> Please join me in congratulating Kostas!
> 
> congratulation Kostas!
> 
> regards.

Re: [DISCUSS] Features for Apache Flink 1.10

2019-09-06 Thread Dian Fu

Hi Gary,

Thanks for kicking off the release schedule of 1.10. +1 for you and Yu Li as 
the release manager.

The feature freeze/release time sounds reasonable.

Thanks,
Dian

> 在 2019年9月7日，上午11:30，Jark Wu  写道：
> 
> Thanks Gary for kicking off the discussion for 1.10 release.
> 
> +1 for Gary and Yu as release managers. Thank you for you effort. 
> 
> Best,
> Jark
> 
> 
>> 在 2019年9月7日，00:52，zhijiang  写道：
>> 
>> Hi Gary,
>> 
>> Thanks for kicking off the features for next release 1.10.  I am very 
>> supportive of you and Yu Li to be the relaese managers.
>> 
>> Just mention another two improvements which want to be covered in FLINK-1.10 
>> and I already confirmed with Piotr to reach an agreement before.
>> 
>> 1. Data serialize and copy only once for broadcast partition [1]: It would 
>> improve the throughput performance greatly in broadcast mode and was 
>> actually proposed in Flink-1.8. Most of works already done before and only 
>> left the last critical jira/PR. It will not take much efforts to make it 
>> ready.
>> 
>> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It 
>> could avoid memory copy from netty stack to flink managed network buffer. 
>> The obvious benefit is decreasing the direct memory overhead greatly in 
>> large-scale jobs. I also heard of some user cases encounter direct OOM 
>> caused by netty memory overhead. Actually this improvment was proposed by 
>> nico in FLINK-1.7 and always no time to focus then. Yun Gao already 
>> submitted a PR half an year ago but have not been reviewed yet. I could help 
>> review the deign and PR codes to make it ready. 
>> 
>> And you could make these two items as lowest priority if possible.
>> 
>> [1] https://issues.apache.org/jira/browse/FLINK-10745
>> [2] https://issues.apache.org/jira/browse/FLINK-10742
>> 
>> Best,
>> Zhijiang
>> --
>> From:Gary Yao 
>> Send Time:2019年9月6日(星期五) 17:06
>> To:dev 
>> Cc:carp84 
>> Subject:[DISCUSS] Features for Apache Flink 1.10
>> 
>> Hi community,
>> 
>> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
>> start kicking off the discussion about what we want to achieve for the 1.10
>> release.
>> 
>> Based on discussions with various people as well as observations from
>> mailing
>> list threads, Yu Li and I have compiled a list of features that we deem
>> important to be included in the next release. Note that the features
>> presented
>> here are not meant to be exhaustive. As always, I am sure that there will be
>> other contributions that will make it into the next release. This email
>> thread
>> is merely to kick off a discussion, and to give users and contributors an
>> understanding where the focus of the next release lies. If there is anything
>> we have missed that somebody is working on, please reply to this thread.
>> 
>> 
>> ** Proposed features and focus
>> 
>> Following the contribution of Blink to Apache Flink, the community released
>> a
>> preview of the Blink SQL Query Processor, which offers better SQL coverage
>> and
>> improved performance for batch queries, in Flink 1.9.0. However, the
>> integration of the Blink query processor is not fully completed yet as there
>> are still pending tasks, such as implementing full TPC-DS support. With the
>> next Flink release, we aim at finishing the Blink integration.
>> 
>> Furthermore, there are several ongoing work threads addressing long-standing
>> issues reported by users, such as improving checkpointing under
>> backpressure,
>> and limiting RocksDBs native memory usage, which can be especially
>> problematic
>> in containerized Flink deployments.
>> 
>> Notable features surrounding Flink’s ecosystem that are planned for the next
>> release include active Kubernetes support (i.e., enabling Flink’s
>> ResourceManager to launch new pods), improved Hive integration, Java 11
>> support, and new algorithms for the Flink ML library.
>> 
>> Below I have included the list of features that we compiled ordered by
>> priority – some of which already have ongoing mailing list threads, JIRAs,
>> or
>> FLIPs.
>> 
>> - Improving Flink’s build system & CI [1] [2]
>> - Support Java 11 [3]
>> - Table API improvements
>>   - Configuration Evolution [4] [5]
>>   - Finish type system: Expression Re-design [6] and UDF refactor
>>   - Streaming DDL: Time attribute (watermark) and Changelog support
>>   - Full SQL partition support for both batch & streaming [7]
>>   - New Java Expression DSL [8]
>>   - SQL CLI with DDL and DML support
>> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>>   - Partition/Function/View support
>> - Remaining Blink planner/runtime merge
>>   - Support all TPC-DS queries [9]
>> - Finer grained resource management
>>   - Unified TaskExecutor Memory Configuration [10]
>>   - Fine Grained Operator Resource Management [11]
>>   - Dynamic Slots Allocation [12]
>> - Finish scheduler re-architecture [13]
>>

Re: [VOTE] Release 1.8.2, release candidate #1

2019-09-09 Thread Dian Fu

+1 (non-binding)

- built from source successfully (mvn clean install -DskipTests)
- checked gpg signature and hashes of the source release and binary release 
packages
- All artifacts have been deployed to the maven central repository
- no new dependencies were added since 1.8.1
- run a couple of tests in IDE success

Regards,
Dian

> 在 2019年9月9日，下午2:28，jincheng sun  写道：
> 
> +1 (binding)
> 
> - checked signatures [SUCCESS]
> - built from source without tests [SUCCESS]
> - ran some tests in IDE [SUCCESS]
> - start local cluster and submit word count example [SUCCESS]
> - announcement PR for website looks good! (I have left a few comments)
> 
> Best,
> Jincheng
> 
> Jark Wu  于2019年9月6日周五 下午8:47写道：
> 
>> Hi everyone,
>> 
>> Please review and vote on the release candidate #1 for the version 1.8.2,
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>> 
>> 
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release and binary convenience releases to be
>> deployed to dist.apache.org [2], which are signed with the key with
>> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "release-1.8.2-rc1" [5],
>> * website pull request listing the new release and adding announcement blog
>> post [6].
>> 
>> The vote will be open for at least 72 hours.
>> Please cast your votes before *Sep. 11th 2019, 13:00 UTC*.
>> 
>> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>> 
>> Thanks,
>> Jark
>> 
>> [1]
>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345670
>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.2-rc1/
>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> [4] https://repository.apache.org/content/repositories/orgapacheflink-1262
>> [5]
>> 
>> https://github.com/apache/flink/commit/6322618bb0f1b7942d86cb1b2b7bc55290d9e330
>> [6] https://github.com/apache/flink-web/pull/262
>>

Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-09-09 Thread Dian Fu

Thanks Jincheng a lot for the remind and thanks all for the voting. I'm closing 
the vote now.
So far, the vote has received:
  - 5 binding +1 votes (Jincheng, Hequn, Jark, Shaoxuan, Becket)
  - 5 non-binding +1 votes (Wei, Xingbo, Terry, Yu, Jeff)
  - No 0/-1 votes

There are more than 3 binding +1 votes, no -1 votes, and the voting time has 
passed. According to the new bylaws, I'm glad to announce that FLIP-58 is 
approved. I'll update the FLIP wiki page accordingly.

Thanks,
Dian


> 在 2019年9月9日，上午10:38，jincheng sun  写道：
> 
> Hi all,
> 
> This VOTE looks like everyone agrees with the current FLIP.
> 
> Hi Time & Aljoscha Do you have any other comments after the ML discussion?
> [1]
> 
> Hi Dian, Could you announce the VOTE result and create a JIRA. for the FLIP
> today later, if there no other feedback?
> 
> Cheers,
> Jincheng
> 
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673i20.html#a32669
> 
> Becket Qin  于2019年9月2日周一 下午1:32写道：
> 
>> +1
>> 
>> It is extremely useful for ML users.
>> 
>> On Mon, Sep 2, 2019 at 9:46 AM Shaoxuan Wang  wrote:
>> 
>>> +1 (binding)
>>> 
>>> This will be a great feature for Flink users, especially for the data
>>> science and AI engineers.
>>> 
>>> Regards,
>>> Shaoxuan
>>> 
>>> 
>>> On Fri, Aug 30, 2019 at 1:35 PM Jeff Zhang  wrote:
>>> 
>>>> +1， very looking forward this feature in flink 1.10
>>>> 
>>>> 
>>>> Yu Li  于2019年8月30日周五 上午11:08写道：
>>>> 
>>>>> +1 (non-binding)
>>>>> 
>>>>> Thanks for driving this!
>>>>> 
>>>>> Best Regards,
>>>>> Yu
>>>>> 
>>>>> 
>>>>> On Fri, 30 Aug 2019 at 11:01, Terry Wang  wrote:
>>>>> 
>>>>>> +1. That would be very helpful.
>>>>>> Best,
>>>>>> Terry Wang
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> 在 2019年8月30日，上午10:18，Jark Wu  写道：
>>>>>>> 
>>>>>>> +1
>>>>>>> 
>>>>>>> Thanks for the great work!
>>>>>>> 
>>>>>>> On Fri, 30 Aug 2019 at 10:04, Xingbo Huang 
>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Dian,
>>>>>>>> 
>>>>>>>> +1,
>>>>>>>> Thanks a lot for driving this.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Xingbo
>>>>>>>>> 在 2019年8月30日，上午9:39，Wei Zhong  写道：
>>>>>>>>> 
>>>>>>>>> Hi Dian,
>>>>>>>>> 
>>>>>>>>> +1 non-binding
>>>>>>>>> Thanks for driving this!
>>>>>>>>> 
>>>>>>>>> Best, Wei
>>>>>>>>> 
>>>>>>>>>> 在 2019年8月29日，09:25，Hequn Cheng  写道：
>>>>>>>>>> 
>>>>>>>>>> Hi Dian,
>>>>>>>>>> 
>>>>>>>>>> +1
>>>>>>>>>> Thanks a lot for driving this.
>>>>>>>>>> 
>>>>>>>>>> Best, Hequn
>>>>>>>>>> 
>>>>>>>>>> On Wed, Aug 28, 2019 at 2:01 PM jincheng sun <
>>>>>> sunjincheng...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Dian,
>>>>>>>>>>> 
>>>>>>>>>>> +1, Thanks for your great job!
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Jincheng
>>>>>>>>>>> 
>>>>>>>>>>> Dian Fu  于2019年8月28日周三 上午11:04写道：
>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'd like to start a voting thread for FLIP-58 [1] since that
>>> we
>>>>> have
>>>>>>>>>>>> reached an agreement on the design in the discussion thread
>>> [2],
>>>>>>>>>>>> 
>>>>>>>>>>>> This vote will be open for at least 72 hours. Unless there
>> is
>>> an
>>>>>>>>>>>> objection, I will try to close it by Sept 2, 2019 00:00 UTC
>> if
>>>> we
>>>>>> have
>>>>>>>>>>>> received sufficient votes.
>>>>>>>>>>>> 
>>>>>>>>>>>> PS: This doesn't mean that we cannot further improve the
>>> design.
>>>>> We
>>>>>>>> can
>>>>>>>>>>>> still discuss the implementation details case by case in the
>>>> JIRA
>>>>> as
>>>>>>>> long
>>>>>>>>>>>> as it doesn't affect the overall design.
>>>>>>>>>>>> 
>>>>>>>>>>>> [1]
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Function+for+Table+API
>>>>>>>>>>>>> 
>>>>>>>>>>>> [2]
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Dian
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Regards
>>>> 
>>>> Jeff Zhang
>>>> 
>>> 
>>

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-09 Thread Dian Fu

Hi Jingsong,

Good point!

1. If it doesn't matter which task performs the finalize work, then I think 
task-0 suggested by Jark is a very good solution.
2. If it requires the last finished task to perform the finalize work, then we 
have to consider other solutions. 
  WRT fault-tolerant of StreamingRuntimeContext#getGlobalAggregateManager, 
AFAIK, there is no built-in support.
1) Regarding to TM failover, I think it's not a problem. We can use an 
accumulator i.e. finish_count and it is increased by 1 when a sub-task is 
finished(i.e. close() method is called).
   When finish_count == RuntimeContext.getNumberOfParallelSubtasks() for 
some sub-task, then we can know that it's the last finished sub-task. This 
holds true even in case of TM failover.
2) Regarding to JM failover, I have no idea how to work around it so far. 
Maybe @Jamie Grier who is the author of this feature could share more thoughts. 
Not sure if there is already solution/plan to support JM failover or this 
feature is not designed for this kind of use case?

Regards,
Dian

> 在 2019年9月9日，下午3:08，shimin yang  写道：
> 
> Hi Jingsong,
> 
> Although it would be nice if the accumulators in GlobalAggregateManager is
> fault-tolerant, we could still take advantage of managed state to guarantee
> the semantic and use the accumulators to implement distributed barrier or
> lock to solve the distributed access problem.
> 
> Best,
> Shimin
> 
> JingsongLee  于2019年9月9日周一 下午1:33写道：
> 
>> Thanks jark and dian:
>> 1.jark's approach: do the work in task-0. Simple way.
>> 2.dian's approach: use StreamingRuntimeContext#getGlobalAggregateManager
>> Can do more operation. But these accumulators are not fault-tolerant?
>> 
>> Best,
>> Jingsong Lee
>> 
>> 
>> --
>> From:shimin yang 
>> Send Time:2019年9月6日(星期五) 15:21
>> To:dev 
>> Subject:Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete
>> 
>> Hi Fu,
>> 
>> That'll be nice.
>> 
>> Thanks.
>> 
>> Best,
>> Shimin
>> 
>> Dian Fu  于2019年9月6日周五 下午3:17写道：
>> 
>>> Hi Shimin,
>>> 
>>> It can be guaranteed to be an atomic operation. This is ensured by the
>> RPC
>>> framework. You could take a look at RpcEndpoint for more details.
>>> 
>>> Regards,
>>> Dian
>>> 
>>>> 在 2019年9月6日，下午2:35，shimin yang  写道：
>>>> 
>>>> Hi Fu,
>>>> 
>>>> Thank you for the remind. I think it would work in my case as long as
>>> it's
>>>> an atomic operation.
>>>> 
>>>> Dian Fu  于2019年9月6日周五 下午2:22写道：
>>>> 
>>>>> Hi Jingsong,
>>>>> 
>>>>> Thanks for bring up this discussion. You can try to look at the
>>>>> GlobalAggregateManager to see if it can meet your requirements. It can
>>> be
>>>>> got via StreamingRuntimeContext#getGlobalAggregateManager().
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>>> 在 2019年9月6日，下午1:39，shimin yang  写道：
>>>>>> 
>>>>>> Hi Jingsong,
>>>>>> 
>>>>>> Big fan of this idea. We faced the same problem and resolved by
>> adding
>>> a
>>>>>> distributed lock. It would be nice to have this feature in JobMaster,
>>>>> which
>>>>>> can replace the lock.
>>>>>> 
>>>>>> Best,
>>>>>> Shimin
>>>>>> 
>>>>>> JingsongLee  于2019年9月6日周五
>> 下午12:20写道：
>>>>>> 
>>>>>>> Hi devs:
>>>>>>> 
>>>>>>> I try to implement streaming file sink for table[1] like
>>>>> StreamingFileSink.
>>>>>>> If the underlying is a HiveFormat, or a format that updates
>> visibility
>>>>>>> through a metaStore, I have to update the metaStore in the
>>>>>>> notifyCheckpointComplete, but this operation occurs on the task
>> side,
>>>>>>> which will lead to distributed access to the metaStore, which will
>>>>>>> lead to bottleneck.
>>>>>>> 
>>>>>>> So I'm curious if we can support notifyOnMaster for
>>>>>>> notifyCheckpointComplete like FinalizeOnMaster.
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>> 
>>> 
>> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>> 
>>>>> 
>>> 
>>> 
>>

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Dian Fu

Congratulations!

> 在 2019年9月11日，下午5:26，Jeff Zhang  写道：
> 
> Congratulations Zili Chen! 
> 
> Wesley Peng mailto:wes...@thepeng.eu>> 于2019年9月11日周三 
> 下午5:25写道：
> Hi
> 
> on 2019/9/11 17:22, Till Rohrmann wrote:
> > I'm very happy to announce that Zili Chen (some of you might also know 
> > him as Tison Kun) accepted the offer of the Flink PMC to become a 
> > committer of the Flink project.
> 
> Congratulations Zili Chen.
> 
> regards.
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: Flink stream python

2019-09-13 Thread Dian Fu

Hi Milan,

Hoping Till has answered your questions. WRT whether it supports to use 
libraries from Java code in Python, the answer is yes. 

Regarding to use Java connectors in Python, there are two ways:
1) If there is a Java TableFactory[1] defined for the connector, you could try 
to use the pyflink.table.descriptors.CustomConnectorDescriptor [2][3].
2) If there is no Java TableFactory defined, you need to write a Python 
connector which is a simple wrapper of the Java connector. You can refer to the 
CSV connector for an example [4]. Built-in ways may be provided In the future 
to eliminate the need of wrapping. However, you can try this way for now.

Besides, no matter in which case of the above, you must make sure that the Java 
library be put in the classpath. Otherwise, it could not find the Java library. 
This is the cause of the error you encountered “ImportError: No module named 
connectors”. If you submit the Python job via CLI, you could use the "-j" 
option. You can refer to [5] for more details.

[1] 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sourceSinks.html#define-a-tablefactory
 

[2] 
https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#pyflink.table.descriptors.CustomConnectorDescriptor
 

[3] 
https://github.com/apache/flink/blob/master/flink-python/pyflink/table/tests/test_descriptor.py#L360
 

[4] 
https://github.com/apache/flink/blob/master/flink-python/pyflink/table/sources.py#L35
 

[5] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html 


Regards,
Dian

> 在 2019年9月13日，下午4:37，Till Rohrmann  写道：
> 
> Hi Milan,
> 
> I can only give you some high level answers because I'm not actively
> involved in the development of Flink's Python support. But I've cc'ed
> Jincheng who is driving this effort and can give you more detailed answers.
> 
> At the moment, Flink's Python API can be seen as a thin wrapper around
> Flink's Table API. So what the Python program does it to construct a Flink
> Table API program by calling Flink's Java API using Py4J. At the moment the
> development concentrated on the Table API and hence the DataStream part is
> not yet fully feature complete. For the table API there are a couple of
> connectors supported [1].
> 
> Since Flink's Python API does not actually ship Python code yet, it does
> not work to implement a source in pure Python. I'm not sure whether this
> will be ever supported tbh but the community is actively working on
> supporting Python user code functions which are executed on the Flink
> cluster. The long term plan is to make the Python API feature complete wrt
> the other available APIs. I'm also confident that we will constantly
> improve on the documentation of this new API. Just keep in mind that this
> feature has just been released as a preview feature with Flink 1.9 and the
> community is heavily working on improving it.
> 
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors
>  
> 
> 
> Cheers,
> Till
> 
> On Fri, Sep 13, 2019 at 10:09 AM Milan Simaković <
> milan.simako...@ibis-instruments.com 
> > wrote:
> 
>> Hello everyone,
>> 
>> I would appreciate if someone could please clarify me several things
>> regarding Flink.
>> 
>> 
>> 
>> Flink 1.10
>> 
>> 
>> 
>> I’m trying to develop PYTHON stream application using pyflink-stream. I
>> managed successfully to run wordcount.py (in attachment). Now, I would like
>> to go one step further and to use other sources such are Twitter API, Kafka
>> topics or socket. However, unfortunately I did not succeed.
>> 
>>   1. Didn’t manage to find any connectors on python docs
>>   
>> https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html
>>  
>> 
>>   2. I tried to import
>>   org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an
>>   error “ImportError: No module named connectors”
>>   3. I developed by myself twitter connector (pure python) but realized
>>   that it doesn’t have sense to use it because it cannot be distributed in a
>>   way it would be with flink.
>> 
>> 
>> 
>> Could you please help me with uncertainties that I have to Flink? I would
>> mean very

Re: [ANNOUNCE] Apache Flink 1.8.2 released

2019-09-14 Thread Dian Fu

Thanks Jark for being the release manager and the great work! Also thanks to 
everyone who has contributed to this release.

Regards,
Dian

> 在 2019年9月14日，上午3:24，Till Rohrmann  写道：
> 
> Thanks Jark for being our release manager and thanks to everyone who has 
> contributed.
> 
> Cheers,
> Till
> 
> On Fri, Sep 13, 2019 at 4:12 PM jincheng sun  > wrote:
> Thanks for being the release manager and the great work Jark :)
> Also thanks to the community making this release possible!
> 
> Best,
> Jincheng
> 
> Jark Wu mailto:imj...@gmail.com>> 于2019年9月13日周五 下午10:07写道：
> Hi,
>  
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.8.2, which is the second bugfix release for the Apache Flink 1.8 
> series.
>  
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
>  
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
>  
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release:
> https://flink.apache.org/news/2019/09/11/release-1.8.2.html 
> 
>  
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12345670 
> 
>  
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> Great thanks to @Jincheng for the kindly help during this release.
>  
> Regards,
> Jark

Re: How to implement grouping set in stream

2019-09-19 Thread Dian Fu

AFAIK, grouping sets has already been supported for streaming in blink planner. 
You could check FLINK-12192 for details.

Regards,
Dian

> 在 2019年9月10日，下午6:51，刘建刚  写道：
> 
>   I want to implement grouping set in stream. I am new to flink sql. I 
> want to find a example to teach me how to self define rule and implement 
> corresponding operator. Can anyone give me any suggestion?

Re: use of org.apache.flink.yarn.cli.FlinkYarnSessionCli in Flink Sql client

2019-09-22 Thread Dian Fu

Hi Dipanjan,

I think you are right that it's already been supported to submit a job to 
cluster via SQL client. This is supported in [1]. Besides, I think that it is 
not configured in the YAML. It's specified in the CLI options when you start up 
the SQL client CLI.

[1] https://issues.apache.org/jira/browse/FLINK-8852 

> 在 2019年9月23日，上午11:00，Terry Wang  写道：
> 
> Hi Dipanjan:
> 
> I just looked through the Flink SQL client code and got the same conclusion 
> as you.
> Look forward to receiving other comments.
> 
> Best,
> Terry Wang
> 
> 
> 
>> 在 2019年9月22日，下午11:53，Dipanjan Mazumder  写道：
>> 
>> Hi ,
>>  Thanks again for responding on my earlier queries..
>>I was again going through the Flink SQL client code and came across the 
>> default custom command-line , few days back i came to know that Flink sql 
>> client is not supported in a full fledged cluster with different resource 
>> managers like yarn but this "org.apache.flink.yarn.cli.FlinkYarnSessionCli" 
>> class seems like is used by the SQL client to establish session with yarn 
>> managed cluster.
>> 
>> Am i wrong in thinking this or is there some other use for this class. 
>> Please kindly help on the same.
>> RegardsDipanjan
>

Re: [DISCUSS] Releasing Flink 1.9.1

2019-09-23 Thread Dian Fu

+1 for 1.9.1 release and Jark being the RM. Thanks Jark for kicking off this 
release and the volunteering. 

Regards,
Dian

> 在 2019年9月24日，上午10:45，Kurt Young  写道：
> 
> +1 for the 1.9.1 release and for Jark being the RM.
> Thanks Jark for the volunteering.
> 
> Best,
> Kurt
> 
> 
> On Mon, Sep 23, 2019 at 9:17 PM Till Rohrmann  wrote:
> 
>> +1 for the 1.9.1 release and for Jark being the RM. I'll help with the
>> review of FLINK-14010.
>> 
>> Cheers,
>> Till
>> 
>> On Mon, Sep 23, 2019 at 8:32 AM Debasish Ghosh 
>> wrote:
>> 
>>> I hope https://issues.apache.org/jira/browse/FLINK-12501 will also be
>> part
>>> of 1.9.1 ..
>>> 
>>> regards.
>>> 
>>> On Mon, Sep 23, 2019 at 11:39 AM Jeff Zhang  wrote:
>>> 
 FLINK-13708 is also very critical IMO. This would cause invalid flink
>> job
 (doubled output)
 
 https://issues.apache.org/jira/browse/FLINK-13708
 
 Jark Wu  于2019年9月23日周一 下午2:03写道：
 
> Hi everyone,
> 
> It has already been a month since we released Flink 1.9.0.
> We already have many important bug fixes from which our users can
>>> benefit
> in the release-1.9 branch (83 resolved issues).
> Therefore, I propose to create the next bug fix release for Flink
>> 1.9.
> 
> Most notable fixes are:
> 
> - [FLINK-13526] When switching to a non existing catalog or database
>> in
 the
> SQL Client the client crashes.
> - [FLINK-13568] It is not possible to create a table with a "STRING"
>>> data
> type via the SQL DDL.
> - [FLINK-13941] Prevent data-loss by not cleaning up small part files
 from
> S3.
> - [FLINK-13490][jdbc] If one column value is null when reading JDBC,
>>> the
> following values will all be null.
> - [FLINK-14107][kinesis] When using event time alignment with the
 Kinsesis
> Consumer the consumer might deadlock in one corner case.
> 
> Furthermore, I would like the following critical issues to be merged
 before
> 1.9.1 release:
> 
> - [FLINK-14118] Reduce the unnecessary flushing when there is no data
> available for flush which can save 20% ~ 40% CPU. (reviewing)
> - [FLINK-13386] Fix A couple of issues with the new dashboard have
 already
> been filed. (PR is created, need review)
> - [FLINK-14010][yarn] The Flink YARN cluster can get into an
>>> inconsistent
> state in some cases, where
> leaderhship for JobManager, ResourceManager and Dispatcher components
>>> is
> split between two master processes. (PR is created, need review)
> 
> I would volunteer as release manager and kick off the release process
 once
> blocker issues has been merged. What do you think?
> 
> If there is any other blocker issues need to be fixed in 1.9.1,
>> please
 let
> me know.
> 
> Cheers,
> Jark
> 
 
 
 --
 Best Regards
 
 Jeff Zhang
 
>>> 
>>> 
>>> --
>>> Debasish Ghosh
>>> http://manning.com/ghosh2
>>> http://manning.com/ghosh
>>> 
>>> Twttr: @debasishg
>>> Blog: http://debasishg.blogspot.com
>>> Code: http://github.com/debasishg
>>> 
>>

Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-09-26 Thread Dian Fu

Hi Wei,

Thanks a lot for bringing up this discussion. Python dependency management is 
very important for Python users. I have left a few comments on the design doc.

Thanks,
Dian

> 在 2019年9月26日，下午12:23，jincheng sun  写道：
> 
> Thanks for bring up the discussion, Wei.
> Overall the design doc looks good. I have left a few comments.
> 
> BTW: Dependency Management is very important for Python UDFs, welcome
> anyone left your suggestions!
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年9月26日周四 上午11:59写道：
> 
>> Hi everyone,
>> 
>> In FLIP-58 [1] we have a plan to support Python UDF. As a critical part of
>> python UDF, the environment and dependency management of users' python code
>> has not been fully discussed.
>> 
>> I'd like to start a discussion on "Flink Python UDF Environment and
>> Dependency Management". Here is the design doc I drafted:
>> 
>> 
>> https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
>> 
>> Please take a look, and feedbacks are welcome.
>> 
>> Thanks,
>> Wei
>> 
>> [1]:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> 
>> 
>>

Re: 订阅邮件

2019-09-26 Thread Dian Fu

To subscribe to the mailing list, you need send email to the following address 
dev-subscr...@flink.apache.org , 
user-subscr...@flink.apache.org  and 
user-zh-subscr...@flink.apache.org  
separately.

> 在 2019年9月26日，上午9:58，杨利君  写道：
> 
> 订阅flink社区邮件

Re: [DISCUSS] Releasing Flink 1.9.1

2019-09-29 Thread Dian Fu

t,
> > > >>>> > >> Jincheng
> > > >>>> > >>
> > > >>>> > >> Jark Wu  于2019年9月25日周三 下午8:04写道：
> > > >>>> > >>
> > > >>>> > >> > Hi all,
> > > >>>> > >> >
> > > >>>> > >> > I am here to update the progress of the issue that needs to
> > be
> > > >>>> > tracked:
> > > >>>> > >> >
> > > >>>> > >> > - FLINK-14010 (merged)
> > > >>>> > >> > - FLINK-14118 (under discussion whether we should back port
> > it
> > > >>>> to 1.9)
> > > >>>> > >> > - FLINK-13386 (Daryl reviewed and Dawid will verify the
> > > >>>> functionality
> > > >>>> > >> > again)
> > > >>>> > >> > - FLINK-13708 (under reviewing)
> > > >>>> > >> > - FLINK-14145 (merged to master and need to merge it to 1.9
> > if
> > > >>>> we all
> > > >>>> > >> > agree)
> > > >>>> > >> >
> > > >>>> > >> > Great thanks to all of you for helping fix and reviewing!
> > > >>>> > >> > Ideally, we can kick off the release vote for the first RC
> > > early
> > > >>>> next
> > > >>>> > >> week.
> > > >>>> > >> >
> > > >>>> > >> > Best,
> > > >>>> > >> > Jark
> > > >>>> > >> >
> > > >>>> > >> > On Wed, 25 Sep 2019 at 01:25, Till Rohrmann <
> > > >>>> trohrm...@apache.org>
> > > >>>> > >> wrote:
> > > >>>> > >> >
> > > >>>> > >> > > FLINK-14010 has been merged.
> > > >>>> > >> > >
> > > >>>> > >> > > Cheers,
> > > >>>> > >> > > Till
> > > >>>> > >> > >
> > > >>>> > >> > > On Tue, Sep 24, 2019 at 11:14 AM Gyula Fóra <
> > > >>>> gyula.f...@gmail.com>
> > > >>>> > >> > wrote:
> > > >>>> > >> > >
> > > >>>> > >> > > > +1 for 1.9.1 soon :)
> > > >>>> > >> > > >
> > > >>>> > >> > > > I would also like to include a fix to:
> > > >>>> > >> > > > FLINK-14145 - getLatestCheckpoint(true) returns wrong
> > > >>>> checkpoint
> > > >>>> > >> > > >
> > > >>>> > >> > > > It is already merged to master and just need to merge
> it
> > to
> > > >>>> 1.9 if
> > > >>>> > >> we
> > > >>>> > >> > all
> > > >>>> > >> > > > agree (https://github.com/apache/flink/pull/9756)
> > > >>>> > >> > > >
> > > >>>> > >> > > > Cheers,
> > > >>>> > >> > > > Gyula
> > > >>>> > >> > > >
> > > >>>> > >> > > > On Tue, Sep 24, 2019 at 8:23 AM Terry Wang <
> > > >>>> zjuwa...@gmail.com>
> > > >>>> > >> wrote:
> > > >>>> > >> > > >
> > > >>>> > >> > > > > +1 for the 1.9.1 release and for Jark being the RM.
> > > >>>> > >> > > > > Thanks Jark for driving on this.
> > > >>>> > >> > > > >
> > > >>>> > >> > > > > Best,
> > > >>>> > >> > > > > Terry Wang
> > > >>>> > >> > > > >
> > > >>>> > >> > > > >
> > > >>>> > >> > > > >
> > > >>>> > >> > > > > > 在 2019年9月24日，下午2:19，Jark Wu  写道：
> > > >>>> > >> > > > > >
> > > >>>> > &g

[DISCUSS] Drop Python 2 support for 1.10

2019-10-08 Thread Dian Fu

Hi everyone,

I would like to propose to drop Python 2 support(Currently Python 2.7, 3.5,
3.6, 3.7 are all supported in Flink) as it's coming to an end at Jan 1, 2020
[1]. A lot of projects [2][3][4] has already stated or are planning to drop
Python 2 support.

The benefits of dropping Python 2 support are:
1. Maintaining Python 2/3 compatibility is a burden and it makes the code
complicate as Python 2 and Python 3 is not compatible.
2. There are many features which are only available in Python 3.x such as Type
Hints[5]. We can only make use of this kind of features after dropping the
Python 2 support.
3. Flink-python depends on third-part projects, such as Apache Beam (may add
more dependencies such as pandas, etc in the near future), it's not possible to
upgrade them to the latest version once they drop the Python 2 support.

Here are the options we have:
1. Drop Python 2 support in 1.10:
As flink-python module is a new module added since 1.9.0 and so dropping Python
2 support at the early stage seems a good choice for us.
2. Deprecate Python 2 in 1.10 and drop its support in 1.11:
As 1.10 is planned to be released around the beginning of 2020. This is also
aligned with the official Python 2 support.

Personally I prefer option 1 as flink-python is new module and there is no much
history reasons to consider.

Looking forward to your feedback!

Regards,
Dian

[1] https://pythonclock.org/
[2] https://python3statement.org/
[3] https://spark.apache.org/news/plan-for-dropping-python-2-support.html

[4]
https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E

[5]
https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5

Re: [DISCUSS] Drop Python 2 support for 1.10

2019-10-09 Thread Dian Fu

Thanks everyone for your reply.  

So far all the reply tend to option 1 (dropping Python 2 support in 1.10) and 
will continue to hear if there are any other opinions. 

@Jincheng @Hequn, you are right, things become more complicate if dropping 
Python 2 support is performed after Python UDF has been supported. Users will 
have to migrate their Python UDFs if they have used features which only are 
supported in Python 2.

Thanks @Yu for your suggestion. It makes much sense to me and will do that. 
Also CC @user and @user-zh ML in case any users are concerned about this.

Thanks,
Dian

> 在 2019年10月9日，下午1:14，Yu Li  写道：
> 
> Thanks for bringing this up Dian.
> 
> Since python 2.7 support was added in 1.9.0 and would be EOL near the
> planned release time for 1.10, I could see a good reason to take option 1.
> 
> Please remember to add an explicit release note and would be better to send
> a notification in user ML about the plan to drop it, just in case some
> 1.9.0 users are already using python 2.7 in their product env.
> 
> Best Regards,
> Yu
> 
> 
> On Wed, 9 Oct 2019 at 11:13, Jeff Zhang  wrote:
> 
>> +1
>> 
>> Hequn Cheng  于2019年10月9日周三 上午11:07写道：
>> 
>>> Hi Dian,
>>> 
>>> +1 to drop Python 2 directly.
>>> 
>>> Just as @jincheng said, things would be more complicated if we are going
>> to
>>> support python UDFs.
>>> The python UDFs will introduce a lot of python dependencies which will
>> also
>>> drop the support of Python 2, such as beam, pandas, pyarrow, etc.
>>> Given this and Python 2 will reach EOL on Jan 1 2020. I think we can drop
>>> Python 2 in Flink as well.
>>> 
>>> As for the two options, I think we can drop it directly in 1.10. The
>>> flink-python is introduced just from 1.9, I think it's safe to drop it
>> for
>>> now.
>>> And we can also benefit from it when we add support for python UDFs.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Wed, Oct 9, 2019 at 8:40 AM jincheng sun 
>>> wrote:
>>> 
>>>> Hi Dian，
>>>> 
>>>> Thanks for bringing this discussion!
>>>> 
>>>> In Flink 1.9 we only add Python Table API mapping to Java Table
>>> API(without
>>>> Python UDFs), there no special requirements for Python version, so we
>> add
>>>> python 2,7 support. But for Flink 1.10, we add the Python UDFs support,
>>>> i.e., user will add more python code in Flink job and more requirements
>>> for
>>>> the features of the Python language.So I think It's better to follow
>> the
>>>> rhythm of Python official.
>>>> 
>>>> Option 2 is the most conservative and correct approach, but for the
>>> current
>>>> situation, we cooperate with the Beam community and use Beam's
>>> portability
>>>> framework for UDFs support, so we prefer to adopt the Option 1.
>>>> 
>>>> Best,
>>>> Jincheng
>>>> 
>>>> 
>>>> 
>>>> Dian Fu  于2019年10月8日周二 下午10:34写道：
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I would like to propose to drop Python 2 support(Currently Python
>> 2.7,
>>>>> 3.5, 3.6, 3.7 are all supported in Flink) as it's coming to an end at
>>> Jan
>>>>> 1, 2020 [1]. A lot of projects [2][3][4] has already stated or are
>>>> planning
>>>>> to drop Python 2 support.
>>>>> 
>>>>> The benefits of dropping Python 2 support are:
>>>>> 1. Maintaining Python 2/3 compatibility is a burden and it makes the
>>> code
>>>>> complicate as Python 2 and Python 3 is not compatible.
>>>>> 2. There are many features which are only available in Python 3.x
>> such
>>> as
>>>>> Type Hints[5]. We can only make use of this kind of features after
>>>> dropping
>>>>> the Python 2 support.
>>>>> 3. Flink-python depends on third-part projects, such as Apache Beam
>>> (may
>>>>> add more dependencies such as pandas, etc in the near future), it's
>> not
>>>>> possible to upgrade them to the latest version once they drop the
>>> Python
>>>> 2
>>>>> support.
>>>>> 
>>>>> Here are the options we have:
>>>>> 1. Drop Python 2 support in 1.10:
>>>>> As flink-python module is a new module added since 1

Re: [DISCUSS] Flink Python UDF Environment and Dependency Management

2019-10-11 Thread Dian Fu

Hi Wei,

Thanks for the great work! It seems that it has reached an agreement on the 
design. Should we start VOTE on this design? I'm also wondering if a FLIP is 
deserved as it introduces user facing API. If so, we should create a FLIP 
before VOTE.

Thanks,
Dian

> 在 2019年10月9日，上午11:23，Wei Zhong  写道：
> 
> Hi Jincheng, Dian and Jeff,
> 
> Thank you for your replies and comments in google doc! I think we have come 
> to an agreement on the desgin doc with only minor changes as follow:
> - Using the API "set_python_executable" instead of "set_environment_variable" 
> to set the python executable file path.
> - Making the argument "requirements_cached_dir" of API 
> "set_python_requirements" optional to support only upload a requirement.txt 
> file.
> 
> I'm also glad to hear any other opinions!
> 
> Thanks,
> Wei
> 
> 
>> 在 2019年9月26日，15:23，Dian Fu  写道：
>> 
>> Hi Wei,
>> 
>> Thanks a lot for bringing up this discussion. Python dependency management 
>> is very important for Python users. I have left a few comments on the design 
>> doc.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年9月26日，下午12:23，jincheng sun  写道：
>>> 
>>> Thanks for bring up the discussion, Wei.
>>> Overall the design doc looks good. I have left a few comments.
>>> 
>>> BTW: Dependency Management is very important for Python UDFs, welcome
>>> anyone left your suggestions!
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Wei Zhong  于2019年9月26日周四 上午11:59写道：
>>> 
>>>> Hi everyone,
>>>> 
>>>> In FLIP-58 [1] we have a plan to support Python UDF. As a critical part of
>>>> python UDF, the environment and dependency management of users' python code
>>>> has not been fully discussed.
>>>> 
>>>> I'd like to start a discussion on "Flink Python UDF Environment and
>>>> Dependency Management". Here is the design doc I drafted:
>>>> 
>>>> 
>>>> https://docs.google.com/document/d/1vq5J3TSyhscQXbpRhz-Yd3KCX62PBJeC_a_h3amUvJ4/edit?usp=sharing
>>>> 
>>>> Please take a look, and feedbacks are welcome.
>>>> 
>>>> Thanks,
>>>> Wei
>>>> 
>>>> [1]:
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Stateless+Function+for+Table>
>>>> 
>>>> 
>> 
>

[VOTE] Drop Python 2 support for 1.10

2019-10-13 Thread Dian Fu

Hi all,

I would like to start the vote for "Drop Python 2 support for 1.10", which is 
discussed and reached a consensus in the discussion thread[1].

The vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Oct 17, 2019 18:00 UTC if we have received sufficient 
votes.

Regards,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Python-2-support-for-1-10-td33824.html

Re: [DISCUSS] Drop Python 2 support for 1.10

2019-10-13 Thread Dian Fu

Thanks Jincheng for the remind and everyone for joining the discussion. I have 
started the vote thread in [1]. 

Thanks,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Drop-Python-2-support-for-1-10-tt33962.html
 
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Drop-Python-2-support-for-1-10-tt33962.html>
> 在 2019年10月12日，下午5:44，jincheng sun  写道：
> 
> Hi Dian,
> 
> I think it's better to bring up the VOTE for this proposal. Then push this
> forward.:)
> 
> Thanks,
> Jincheng
> 
> Timo Walther  于2019年10月10日周四 下午8:07写道：
> 
>> I also heard from other companies that upgrading to Python 3 is in
>> progress for data teams.
>> 
>> +1 for simplifying the code base with option 1).
>> 
>> Thanks,
>> Timo
>> 
>> On 08.10.19 16:34, Dian Fu wrote:
>>> Hi everyone,
>>> 
>>> I would like to propose to drop Python 2 support(Currently Python 2.7,
>> 3.5, 3.6, 3.7 are all supported in Flink) as it's coming to an end at Jan
>> 1, 2020 [1]. A lot of projects [2][3][4] has already stated or are planning
>> to drop Python 2 support.
>>> 
>>> The benefits of dropping Python 2 support are:
>>> 1. Maintaining Python 2/3 compatibility is a burden and it makes the
>> code complicate as Python 2 and Python 3 is not compatible.
>>> 2. There are many features which are only available in Python 3.x such
>> as Type Hints[5]. We can only make use of this kind of features after
>> dropping the Python 2 support.
>>> 3. Flink-python depends on third-part projects, such as Apache Beam (may
>> add more dependencies such as pandas, etc in the near future), it's not
>> possible to upgrade them to the latest version once they drop the Python 2
>> support.
>>> 
>>> Here are the options we have:
>>> 1. Drop Python 2 support in 1.10:
>>> As flink-python module is a new module added since 1.9.0 and so dropping
>> Python 2 support at the early stage seems a good choice for us.
>>> 2. Deprecate Python 2 in 1.10 and drop its support in 1.11:
>>> As 1.10 is planned to be released around the beginning of 2020. This is
>> also aligned with the official Python 2 support.
>>> 
>>> Personally I prefer option 1 as flink-python is new module and there is
>> no much history reasons to consider.
>>> 
>>> Looking forward to your feedback!
>>> 
>>> Regards,
>>> Dian
>>> 
>>> [1] https://pythonclock.org/ <https://pythonclock.org/>
>>> [2] https://python3statement.org/ <https://python3statement.org/>
>>> [3]
>> https://spark.apache.org/news/plan-for-dropping-python-2-support.html <
>> https://spark.apache.org/news/plan-for-dropping-python-2-support.html>
>>> [4]
>> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
>> <
>> https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E
>>> 
>>> [5]
>> https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5
>> <
>> https://stackoverflow.com/questions/32557920/what-are-type-hints-in-python-3-5
>>> 
>> 
>> 
>>

Re: [VOTE] FLIP-78: Flink Python UDF Environment and Dependency Management

2019-10-13 Thread Dian Fu

Hi Wei,

+1 (non-binding). Thanks for driving this. 

Thanks,
Dian

> 在 2019年10月14日，下午1:40，jincheng sun  写道：
> 
> +1
> 
> Wei Zhong  于2019年10月12日周六 下午8:41写道：
> 
>> Hi all,
>> 
>> I would like to start the vote for FLIP-78[1] which is discussed and
>> reached consensus in the discussion thread[2].
>> 
>> The vote will be open for at least 72 hours. I'll try to close it by
>> 2019-10-16 18:00 UTC, unless there is an objection or not enough votes.
>> 
>> Thanks,
>> Wei
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
>> <
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-UDF-Environment-and-Dependency-Management-td33514.html
>>> 
>> 
>> 
>>

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-14 Thread Dian Fu

Hi Stephan,

Big +1 for adding stateful functions to Apache Flink! The use cases unlocked 
with this feature are very interesting and promising.

Regarding to whether to place it into Flink core repository, personally I 
perfer to put it in the main repository. This feature introduces a new set of 
APIs and it will support a new set of applications. It enriches the API stack 
of Apache Flink. This is somewhat simlar to the Table API & SQL, State 
Processor API, CEP library, etc. If the applications supported by this feature 
are important enough for Flink, it's more appropriate to put it directly into 
the main repository.

Regards,
Dian

> 在 2019年10月13日，上午10:47，Hequn Cheng  写道：
> 
> Hi Stephan,
> 
> Big +1 for adding this to Apache Flink! 
> 
> As for the problem of whether this should be added to the Flink main 
> repository, from my side, I prefer to put it in the main repository. Not only 
> Stateful Functions shares very close relations with the current Flink, but 
> also other libs or modules in Flink can make use of it the other way round in 
> the future. At that time the Flink API stack would also be changed a bit and 
> this would be cool.
> 
> Best, Hequn
> 
> On Sat, Oct 12, 2019 at 9:16 PM Biao Liu  > wrote:
> Hi Stehpan,
> 
> +1 for having Stateful Functions in Flink.
> 
> Before discussing which repository it should belong, I was wondering if we 
> have reached an agreement of "splitting flink repository" as Piotr mentioned 
> or not. It seems that it's just no more further discussion. 
> It's OK for me to add it to core repository. After all almost everything is 
> in core repository now. But if we decide to split the core repository 
> someday, I tend to create a separate repository for Stateful Functions. It 
> might be good time to take the first step of splitting.
> 
> Thanks,
> Biao /'bɪ.aʊ/
> 
> 
> 
> On Sat, 12 Oct 2019 at 19:31, Yu Li  > wrote:
> Hi Stephan,
> 
> Big +1 for adding stateful functions to Flink. I believe a lot of user would 
> be interested to try this out and I could imagine how this could contribute 
> to reduce the TCO for business requiring both streaming processing and 
> stateful functions.
> 
> And my 2 cents is to put it into flink core repository since I could see a 
> tight connection between this library and flink state.
> 
> Best Regards,
> Yu
> 
> 
> On Sat, 12 Oct 2019 at 17:31, jincheng sun  > wrote:
> Hi Stephan,
> 
> bit +1 for adding this great features to Apache Flink.
> 
> Regarding where we should place it, put it into Flink core repository or 
> create a separate repository? I prefer put it into main repository and 
> looking forward the more detail discussion for this decision.
> 
> Best,
> Jincheng
> 
> 
> Jingsong Li mailto:jingsongl...@gmail.com>> 
> 于2019年10月12日周六 上午11:32写道：
> Hi Stephan,
> 
> big +1 for this contribution. It provides another user interface that is easy 
> to use and popular at this time. these functions, It's hard for users to 
> write in SQL/TableApi, while using DataStream is too complex. (We've done 
> some stateFun kind jobs using DataStream before). With statefun, it is very 
> easy.
> 
> I think it's also a good opportunity to exercise Flink's core capabilities. I 
> looked at stateful-functions-flink briefly, it is very interesting. I think 
> there are many other things Flink can improve. So I think it's a better thing 
> to put it into Flink, and the improvement for it will be more natural in the 
> future.
> 
> Best,
> Jingsong Lee
> 
> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz  > wrote:
> Hi Stephan,
> 
> I think this is a nice library, but what I like more about it is that it 
> suggests exploring different use-cases. I think it definitely makes sense for 
> the Flink community to explore more lightweight applications that reuses 
> resources. Therefore I definitely think it is a good idea for Flink community 
> to accept this contribution and help maintaining it.
> 
> Personally I'd prefer to have it in a separate repository. There were a few 
> discussions before where different people were suggesting to extract 
> connectors and other libraries to separate repositories. Moreover I think it 
> could serve as an example for the Flink ecosystem website[1]. This could be 
> the first project in there and give a good impression that the community sees 
> potential in the ecosystem website.
> Lastly, I'm wondering if this should go through PMC vote according to our 
> bylaws[2]. In the end the suggestion is to adopt an existing code base as is. 
> It also proposes a new programs concept that could result in a shift of 
> priorities for the community in a long run.
> Best,
> 
> Dawid
> [1] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Create-a-Flink-ecosystem-website-td27519.html
>  
>

Re: [VOTE] Drop Python 2 support for 1.10

2019-10-18 Thread Dian Fu

Hi all,

Thanks you all for the votes.

So far we have got 4 +1 votes, 3 binding (Jincheng, Hequn, Bowen), 1 
non-binding (Vino) and there is no -1 votes.

Therefore, I'm glad to announce that the proposal "Drop Python 2 support for 
1.10" has passed.

Thanks,
Dian

> 在 2019年10月16日，下午3:07，vino yang  写道：
> 
> +1
> 
> Bowen Li  于2019年10月16日周三 上午5:12写道：
> 
>> +1
>> 
>> On Sun, Oct 13, 2019 at 10:54 PM Hequn Cheng  wrote:
>> 
>>> +1
>>> 
>>> Thanks a lot for driving this, Dian!
>>> 
>>> On Mon, Oct 14, 2019 at 1:46 PM jincheng sun 
>>> wrote:
>>> 
>>>> +1
>>>> 
>>>> Dian Fu  于2019年10月14日周一 下午1:21写道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I would like to start the vote for "Drop Python 2 support for 1.10",
>>>> which
>>>>> is discussed and reached a consensus in the discussion thread[1].
>>>>> 
>>>>> The vote will be open for at least 72 hours. Unless there is an
>>>> objection,
>>>>> I will try to close it by Oct 17, 2019 18:00 UTC if we have received
>>>>> sufficient votes.
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Python-2-support-for-1-10-td33824.html
>>>>> <
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Drop-Python-2-support-for-1-10-td33824.html
>>>>>> 
>>>> 
>>> 
>>

Re: [VOTE] Accept Stateful Functions into Apache Flink

2019-10-21 Thread Dian Fu

+1 (non-binding)

Regards,
Dian

> 在 2019年10月22日，上午9:10，Kurt Young  写道：
> 
> +1 (binding)
> 
> Best,
> Kurt
> 
> 
> On Tue, Oct 22, 2019 at 12:56 AM Fabian Hueske  wrote:
> 
>> +1 (binding)
>> 
>> Am Mo., 21. Okt. 2019 um 16:18 Uhr schrieb Thomas Weise :
>> 
>>> +1 (binding)
>>> 
>>> 
>>> On Mon, Oct 21, 2019 at 7:10 AM Timo Walther  wrote:
>>> 
 +1 (binding)
 
 Thanks,
 Timo
 
 
 On 21.10.19 15:59, Till Rohrmann wrote:
> +1 (binding)
> 
> Cheers,
> Till
> 
> On Mon, Oct 21, 2019 at 12:13 PM Robert Metzger >> 
 wrote:
> 
>> +1 (binding)
>> 
>> On Mon, Oct 21, 2019 at 12:06 PM Stephan Ewen 
>>> wrote:
>> 
>>> This is the official vote whether to accept the Stateful Functions
>>> code
>>> contribution to Apache Flink.
>>> 
>>> The current Stateful Functions code, documentation, and website can
>>> be
>>> found here:
>>> https://statefun.io/
>>> https://github.com/ververica/stateful-functions
>>> 
>>> This vote should capture whether the Apache Flink community is
 interested
>>> in accepting, maintaining, and evolving Stateful Functions.
>>> 
>>> Reiterating my original motivation, I believe that this project is
>> a
>> great
>>> match for Apache Flink, because it helps Flink to grow the
>> community
>> into a
>>> new set of use cases. We see current users interested in such use
 cases,
>>> but they are not well supported by Flink as it currently is.
>>> 
>>> I also personally commit to put time into making sure this
>> integrates
>> well
>>> with Flink and that we grow contributors and committers to maintain
 this
>>> new component well.
>>> 
>>> This is a "Adoption of a new Codebase" vote as per the Flink bylaws
 [1].
>>> Only PMC votes are binding. The vote will be open at least 6 days
>>> (excluding weekends), meaning until Tuesday Oct.29th 12:00 UTC, or
 until
>> we
>>> achieve the 2/3rd majority.
>>> 
>>> Happy voting!
>>> 
>>> Best,
>>> Stephan
>>> 
>>> [1]
>>> 
>> 
 
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
 
 
 
>>> 
>>

Re: [ANNOUNCE] Becket Qin joins the Flink PMC

2019-10-28 Thread Dian Fu

Congrats, Becket.

> 在 2019年10月28日，下午6:07，Fabian Hueske  写道：
> 
> Hi everyone,
> 
> I'm happy to announce that Becket Qin has joined the Flink PMC.
> Let's congratulate and welcome Becket as a new member of the Flink PMC!
> 
> Cheers,
> Fabian

[DISCUSS] How to prepare the Python environment of PyFlink release in the current Flink release process

2019-10-28 Thread Dian Fu

Hi all,

We have reached a consensus that the PyFlink package should be published to 
PyPI in [1]. Thanks to Jincheng's effort, the PyPI account has already been 
created and available to use now [2]. It means that we could publish PyFlink to 
PyPI in the coming releases and it also means that additional steps will be 
added to the normal process of the Flink release to prepare the PyFlink release 
package.

It needs a proper Python environment(i.e. Python 3.5+, setuptools, etc) to 
build the PyFlink package. There are two options in my mind to prepare the 
Python environment:
1) Reuse the script lint-python.sh defined in flink-python module to create the 
required virtual environment and build the PyFlink package using the created 
virtual environment.
2) It's assumed that the local Python environment is properly installed and 
ready to use. The Python environment requirement will be documented at the page 
"Create a Flink Release" and validation check could also be added in 
create_binary_release.sh to throw an meaningful error with hints how to fix it 
if it's not correct.

Option 1:
Pros:
- It's transparent for release managers.
Cons:
- It needs to prepare the virtual environment during preparing the PyFlink 
release package and it will take some several minutes as it need to download a 
few binaries.

Option 2:
Pros:
- There is no need to prepare the virtual environment if the local environment 
is already properly configured.
Cons:
- It requires the release managers to prepare the local Python environment and 
not all the people are familiar with Python and it's a burden for release 
managers.

Personally I prefer to option 1). 

Looking forward to your feedback!

PS: I think this issue could also be discussed in the JIRA. But I tend to bring 
up the discussion to ML as it introduces an additional step to the release 
process and I think this should be visible to the community and it should be 
well discussed. Besides, we could also get more feedback.

Regards,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-tt31201.html
[2] 
https://issues.apache.org/jira/browse/FLINK-13011?focusedCommentId=16947307&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16947307

Re: [DISCUSS] How to prepare the Python environment of PyFlink release in the current Flink release process

2019-11-03 Thread Dian Fu

Hi all,

Thanks a lot for the discussion. As all the people tend to option #1, we will 
take option#1 as the solution for this issue. Thanks again!

Thanks,
Dian

> 在 2019年10月30日，上午9:58，Hequn Cheng  写道：
> 
> Hi Dian,
> 
> Thanks a lot for bringing the discussion.
> It would be a headache to address these environmental problems, so +1 for
> option #1 to use the virtual environment.
> 
> Best, Hequn
> 
> On Tue, Oct 29, 2019 at 9:31 PM Till Rohrmann  wrote:
> 
>> Thanks for bringing this topic up Dian. I'd be in favour of option #1
>> because this would also allow to create reproducible builds.
>> 
>> Cheers,
>> Till
>> 
>> On Tue, Oct 29, 2019 at 5:28 AM jincheng sun 
>> wrote:
>> 
>>> Hi,
>>> Thanks for bringing up the discussion Dian.
>>> +1 for the #1.
>>> 
>>> Hi Jeff, this changes is for the PyFlink release, i.e.,The release
>> manager
>>> should build the release package for Pyflink, and prepare the python
>>> environment during the building. Since 1.10 we only support python 3.5+,
>> so
>>> it will throw an exception if you use python 3.4.
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Jeff Zhang  于2019年10月29日周二 上午11:55写道：
>>> 
>>>> I am a little confused, why we need to prepare python environment in
>>>> release. Shouldn't that be done when user start to use pyflink ?
>>>> Or do you mean to set up python environment for pyflink's CI build ?
>>>> 
>>>> Regarding this problem  "It needs a proper Python environment(i.e.
>> Python
>>>> 3.5+, setuptools, etc) to build the PyFlink package"
>>>> Would the build fail if I use python 3.4 ?
>>>> 
>>>> 
>>>> Dian Fu  于2019年10月29日周二 上午11:01写道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> We have reached a consensus that the PyFlink package should be
>>> published
>>>>> to PyPI in [1]. Thanks to Jincheng's effort, the PyPI account has
>>> already
>>>>> been created and available to use now [2]. It means that we could
>>> publish
>>>>> PyFlink to PyPI in the coming releases and it also means that
>>> additional
>>>>> steps will be added to the normal process of the Flink release to
>>> prepare
>>>>> the PyFlink release package.
>>>>> 
>>>>> It needs a proper Python environment(i.e. Python 3.5+, setuptools,
>> etc)
>>>> to
>>>>> build the PyFlink package. There are two options in my mind to
>> prepare
>>>> the
>>>>> Python environment:
>>>>> 1) Reuse the script lint-python.sh defined in flink-python module to
>>>>> create the required virtual environment and build the PyFlink package
>>>> using
>>>>> the created virtual environment.
>>>>> 2) It's assumed that the local Python environment is properly
>> installed
>>>>> and ready to use. The Python environment requirement will be
>> documented
>>>> at
>>>>> the page "Create a Flink Release" and validation check could also be
>>>> added
>>>>> in create_binary_release.sh to throw an meaningful error with hints
>> how
>>>> to
>>>>> fix it if it's not correct.
>>>>> 
>>>>> Option 1:
>>>>> Pros:
>>>>> - It's transparent for release managers.
>>>>> Cons:
>>>>> - It needs to prepare the virtual environment during preparing the
>>>> PyFlink
>>>>> release package and it will take some several minutes as it need to
>>>>> download a few binaries.
>>>>> 
>>>>> Option 2:
>>>>> Pros:
>>>>> - There is no need to prepare the virtual environment if the local
>>>>> environment is already properly configured.
>>>>> Cons:
>>>>> - It requires the release managers to prepare the local Python
>>>> environment
>>>>> and not all the people are familiar with Python and it's a burden for
>>>>> release managers.
>>>>> 
>>>>> Personally I prefer to option 1).
>>>>> 
>>>>> Looking forward to your feedback!
>>>>> 
>>>>> PS: I think this issue could also be discussed in the JIRA. But I
>> tend
>>> to
>>>>> bring up the discussion to ML as it introduces an additional step to
>>> the
>>>>> release process and I think this should be visible to the community
>> and
>>>> it
>>>>> should be well discussed. Besides, we could also get more feedback.
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-tt31201.html
>>>>> [2]
>>>>> 
>>>> 
>>> 
>> https://issues.apache.org/jira/browse/FLINK-13011?focusedCommentId=16947307&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16947307
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best Regards
>>>> 
>>>> Jeff Zhang
>>>> 
>>> 
>>

[DISCUSS] PyFlink User-Defined Function Resource Management

2019-11-04 Thread Dian Fu

Hi everyone,

In FLIP-58[1] it will add the support of Python user-defined stateless function 
for Python Table API. It will launch a separate Python process for Python 
user-defined function execution. The resources used by the Python process 
should be managed properly by Flink’s resource management framework. FLIP-49[2] 
has proposed a unified memory management framework and PyFlink user-defined 
function resource management should be based on it. Jincheng, Hequn, Xintong, 
GuoWei and I discussed offline about this. I draft a design doc[3] and want to 
start a discussion about PyFlink user-defined function resource management.

Welcome any comments on the design doc or giving us feedback on the ML directly.

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
[3] 
https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m

Re: How long is the flink sql task state default ttl?

2019-11-07 Thread Dian Fu

It's disabled by default. 

BTW: You only need to send it to user ML and it's not necessary to send it to 
the dev ML.

> 在 2019年11月7日，下午3:36，LakeShen  写道：
> 
> Hi community， as I know I can use idle state retention time to clear the 
> flink sql task state,I have a question is that how long the flink sql task 
> state default ttl is . Thanks

Re: [ANNOUNCE] Jark Wu is now part of the Flink PMC

2019-11-08 Thread Dian Fu

Hi Jark,

Congrats. Well deserved!

Regards,
Dian

> 在 2019年11月8日，下午5:51，jincheng sun  写道：
> 
> Hi all,
> 
> On behalf of the Flink PMC, I'm happy to announce that Jark Wu is now
> part of the Apache Flink Project Management Committee (PMC).
> 
> Jark has been a committer since February 2017. He has been very active on
> Flink's Table API / SQL component, as well as frequently helping
> manage/verify/vote releases. He has been writing many blogs about Flink,
> also driving the translation work of Flink website and documentation. He is
> very active in China community as he gives talks about Flink at many events
> in China.
> 
> Congratulations & Welcome Jark!
> 
> Best,
> Jincheng (on behalf of the Flink PMC)

Re: [DISCUSS] Releasing Flink 1.8.3

2019-11-08 Thread Dian Fu

Hi Jincheng,

Thanks a lot for bringing up this discussion. +1 for releasing 1.8.3.

Regards,
Dian

On Sat, Nov 9, 2019 at 12:11 PM jincheng sun 
wrote:

> Hi Flink devs,
>
> It has been more than 2 months since the 1.8.2 released. So, What do you
> think about releasing Flink 1.8.3 soon?
>
> We already have many important bug fixes in the release-1.8 branch (29
> resolved issues).
>
> Most notable fixes are:
>
> - FLINK-14010 Dispatcher & JobManagers don't give up leadership when AM is
> shut down
> - FLINK-14315 NPE with JobMaster.disconnectTaskManager
> - FLINK-12848 Method equals() in RowTypeInfo should consider fieldsNames
> - FLINK-12342 Yarn Resource Manager Acquires Too Many Containers
> - FLINK-14589 Redundant slot requests with the same AllocationID leads to
> inconsistent slot table
>
> Furthermore, the following critical issues is in progress, maybe we can
> wait for it if it is not too much effort.
>
> - FLINK-13184 Starting a TaskExecutor blocks the YarnResourceManager's main
> thread
>
> Please let me know what you think?
>
> Best,
> Jincheng
>

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-11-11 Thread Dian Fu

Hi Jincheng,

Thanks for the reply and also looking forward to the feedback from the 
community.

Thanks,
Dian

> 在 2019年11月11日，下午2:34，jincheng sun  写道：
> 
> Hi all,
> 
> +1， Thanks for bring up this discussion Dian!
> 
> The Resource Management is very important for PyFlink UDF. So, It's great
> if anyone can add more comments or inputs in the design doc or feedback in
> ML. :)
> 
> Best,
> Jincheng
> 
> Dian Fu  于2019年11月5日周二 上午11:32写道：
> 
>> Hi everyone,
>> 
>> In FLIP-58[1] it will add the support of Python user-defined stateless
>> function for Python Table API. It will launch a separate Python process for
>> Python user-defined function execution. The resources used by the Python
>> process should be managed properly by Flink’s resource management
>> framework. FLIP-49[2] has proposed a unified memory management framework
>> and PyFlink user-defined function resource management should be based on
>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about this. I
>> draft a design doc[3] and want to start a discussion about PyFlink
>> user-defined function resource management.
>> 
>> Welcome any comments on the design doc or giving us feedback on the ML
>> directly.
>> 
>> Regards,
>> Dian
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>> [2]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>> [3]
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m

[DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-11-13 Thread Dian Fu

Hi all,

I'm reaching out to see if there is an existing security specific mailing list 
in Flink. If there is, we should expose it in the offcial web site of Flink [1] 
to guide people to report security issues to this mailing list. If it still 
doesn't exist, I'm here to propose to setup a secur...@flink.apache.org mailing 
list for reporting and discussion of security specific issues. Currently, most 
well known apache projects such as apache common[2], hadoop[3], spark[4], 
kafka[5], hive[6], etc have a security specific mailing list. It would be nice 
if there is also a security specific mailing list for Flink.

Note that users should report security issues to the security mailing list. 

Looking forward to your feedback!

Regards,
Dian

[1] https://flink.apache.org/community.html
[2] https://commons.apache.org/mail-lists.html
[3] https://hadoop.apache.org/mailing_lists.html
[4] https://spark.apache.org/community.html
[5] https://kafka.apache.org/project-security.html
[6] https://hive.apache.org/mailing_lists.html

Re: [DISCUSS] Release flink-shaded 9.0

2019-11-19 Thread Dian Fu

Hi Chesnay,

Thanks a lot for kicking off this release. +1 to release flink-shaded 9.0.

I'm willing to help on the release. Please feel free to let me know
if there is anything I could help.

Regards,
Dian

On Mon, Nov 18, 2019 at 8:43 PM Ufuk Celebi  wrote:

> @Chesnay: I know you said that you are pretty busy these days. If we can't
> find anybody else to work on this, when would you be available to create
> the first RC?
>
> On Sun, Nov 17, 2019 at 6:48 AM Hequn Cheng  wrote:
>
> > Hi,
> >
> > Big +1 to release 9.0.
> > It would be good if we can solve these security vulnerabilities.
> >
> > Thanks a lot for your nice work and kick off the release so quickly.
> >
> >
> > On Fri, Nov 15, 2019 at 11:50 PM Ufuk Celebi  wrote:
> >
> > > From what I can see, the Jackson version bump fixes quite a few
> > > vulnerabilities. Therefore, I'd be +1 to release flink-shaded 9.0.
> > >
> > > Thanks for all the work to verify this on master already.
> > >
> > > – Ufuk
> > >
> > >
> > > On Fri, Nov 15, 2019 at 2:26 PM Chesnay Schepler 
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'd like to kick off the next release for flink-shaded. Background is
> > > > that we recently bumped jackson to 2.10.1 to fix a variety of
> security
> > > > vulnerabilities, and it would be good to include them in the upcoming
> > > > 1.8.3/1.9.2 releases.
> > > >
> > > > The release would contain few changes beyond the jackson changes;
> > > > flink-shaded can now be compiled on Java 11 and an encoding issue for
> > > > the NOTICE files was fixed.
> > > >
> > > > So overall this should be very little overhead.
> > > >
> > > > I have already verified that the master would work with this version
> > > > (this being a reasonable indicator for it also working in previous
> > > > version).
> > > >
> > > > I'd also appreciate it if someone would volunteer to handle the
> > release;
> > > > I'm quite bogged down at the moment :(
> > > >
> > > > Regards,
> > > >
> > > > Chesnay
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Release flink-shaded 9.0

2019-11-19 Thread Dian Fu

I see, thanks for the reminder @Chesnay!

Will help on the release check once the RC is out.

Regards,
Dian

On Tue, Nov 19, 2019 at 4:41 PM Chesnay Schepler  wrote:

> Thanks for the offer, but without being a committer I don't think
> there's a lot to do :/
>
> @Uce If no one else steps up I'll kick it off later today myself; this
> would mean a release on Friday.
>
> On 19/11/2019 09:17, Dian Fu wrote:
> > Hi Chesnay,
> >
> > Thanks a lot for kicking off this release. +1 to release flink-shaded
> 9.0.
> >
> > I'm willing to help on the release. Please feel free to let me know
> > if there is anything I could help.
> >
> > Regards,
> > Dian
> >
> > On Mon, Nov 18, 2019 at 8:43 PM Ufuk Celebi  wrote:
> >
> >> @Chesnay: I know you said that you are pretty busy these days. If we
> can't
> >> find anybody else to work on this, when would you be available to create
> >> the first RC?
> >>
> >> On Sun, Nov 17, 2019 at 6:48 AM Hequn Cheng 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Big +1 to release 9.0.
> >>> It would be good if we can solve these security vulnerabilities.
> >>>
> >>> Thanks a lot for your nice work and kick off the release so quickly.
> >>>
> >>>
> >>> On Fri, Nov 15, 2019 at 11:50 PM Ufuk Celebi  wrote:
> >>>
> >>>>  From what I can see, the Jackson version bump fixes quite a few
> >>>> vulnerabilities. Therefore, I'd be +1 to release flink-shaded 9.0.
> >>>>
> >>>> Thanks for all the work to verify this on master already.
> >>>>
> >>>> – Ufuk
> >>>>
> >>>>
> >>>> On Fri, Nov 15, 2019 at 2:26 PM Chesnay Schepler 
> >>>> wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> I'd like to kick off the next release for flink-shaded. Background is
> >>>>> that we recently bumped jackson to 2.10.1 to fix a variety of
> >> security
> >>>>> vulnerabilities, and it would be good to include them in the upcoming
> >>>>> 1.8.3/1.9.2 releases.
> >>>>>
> >>>>> The release would contain few changes beyond the jackson changes;
> >>>>> flink-shaded can now be compiled on Java 11 and an encoding issue for
> >>>>> the NOTICE files was fixed.
> >>>>>
> >>>>> So overall this should be very little overhead.
> >>>>>
> >>>>> I have already verified that the master would work with this version
> >>>>> (this being a reasonable indicator for it also working in previous
> >>>>> version).
> >>>>>
> >>>>> I'd also appreciate it if someone would volunteer to handle the
> >>> release;
> >>>>> I'm quite bogged down at the moment :(
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Chesnay
> >>>>>
> >>>>>
>
>

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-11-19 Thread Dian Fu

ed at the very beginning in its
> > online ref-guide book here <http://hbase.apache.org/book.html#_preface
> >).
> >
> > Hope these information helps. Thanks.
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 14 Nov 2019 at 18:11, Chesnay Schepler 
> wrote:
> >
> > > Source: https://www.apache.org/security/
> > >
> > > Now, we can of course setup such a mailing list (as outlined here
> > > https://www.apache.org/security/committers.html), but I'm not sure if
> it
> > > is necessary since the number of reports is _really_ low.
> > >
> > > On 14/11/2019 11:03, Chesnay Schepler wrote:
> > > > AFAIK, the official way to report vulnerabilities in any apache
> > > > project is to write to secur...@apache.org and/or notify the
> > > > respective PMC. So far, we had several reports that went this route,
> > > > hence I'm not convinced that an additional ML is required.
> > > >
> > > > I would be fine with an additional paragraph somewhere outlining this
> > > > though.
> > > >
> > > > On 14/11/2019 06:57, Jark Wu wrote:
> > > >> Hi Dian,
> > > >>
> > > >> Good idea and +1 to setup security mailing list.
> > > >> Security vulnerabilities should not be publicly disclosed (e.g. via
> > > >> dev ML
> > > >> or JIRA) until the project has responded.
> > > >> However, AFAIK, Flink doesn't have an official process to
> > > >> report vulnerabilities.
> > > >> It would be nice to have one to protect Flink users and response
> > > >> security
> > > >> problems quickly.
> > > >>
> > > >> Btw, we may also need a dedicated page to describe the security
> > > >> vulnerabilities report process and CVE list on the website.
> > > >>
> > > >> Best,
> > > >> Jark
> > > >>
> > > >>
> > > >>
> > > >> On Thu, 14 Nov 2019 at 13:36, Hequn Cheng 
> > wrote:
> > > >>
> > > >>> Hi Dian,
> > > >>>
> > > >>> Good idea! +1 to have a security mailing list.
> > > >>> It is nice for Flink to have an official procedure to handle
> security
> > > >>> problems, e.g., reporting, addressing and publishing.
> > > >>>
> > > >>> Best, Hequn
> > > >>>
> > > >>> On Thu, Nov 14, 2019 at 1:20 PM Jeff Zhang 
> wrote:
> > > >>>
> > > >>>> Thanks Dian Fu for this proposal. +1 for creating security mail
> > > >>>> list. To
> > > >>> be
> > > >>>> noticed, security mail list is private mail list, could not be
> > > >>>> subscribed
> > > >>>> publicly.
> > > >>>> FYI, apache member can create mail list using this self service
> tool
> > > >>>> https://selfserve.apache.org/
> > > >>>>
> > > >>>>
> > > >>>> jincheng sun  于2019年11月14日周四
> > > >>>> 下午12:25写道：
> > > >>>>
> > > >>>>> Hi Dian,
> > > >>>>>
> > > >>>>> Thanks a lot for bringing up this discussion. This is very
> > important
> > > >>> for
> > > >>>>> Flink community!
> > > >>>>>
> > > >>>>> I think setup a security mailing list for Flink is pretty nice
> > > >>> although `
> > > >>>>> secur...@apache.org` can be used and the report will be
> forwarded
> > to
> > > >>>> Flink
> > > >>>>> private mailing list if there is no project specific security
> > mailing
> > > >>>>> list. One thing that is pretty sure is that we should guide users
> > on
> > > >>> how
> > > >>>> to
> > > >>>>> report security issues in Flink website as security
> vulnerabilities
> > > >>>> should
> > > >>>>> not be entered into a project's public bug tracker directly
> > according
> > > >>> to
> > > >>>>> the guidance for how to handling the security vulnerabilities in
> > ASF
> > > >>>>> site[1].
> > > &g

Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing Framework

2019-11-19 Thread Dian Fu

+1 (non-binding)

Regards,
Dian

> 在 2019年11月19日，下午6:31，Piotr Nowojski  写道：
> 
> +1 (non-binding)
> 
> Piotrek
> 
>> On 19 Nov 2019, at 04:20, Yang Wang  wrote:
>> 
>> +1 (non-binding)
>> 
>> It is great to have a new end-to-end test framework, even it is only for
>> performance tests now.
>> 
>> Best,
>> Yang
>> 
>> Jingsong Li  于2019年11月19日周二 上午9:54写道：
>> 
>>> +1 (non-binding)
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Nov 18, 2019 at 7:59 PM Becket Qin  wrote:
>>> 
 +1 (binding) on having the test suite.
 
 BTW, it would be good to have a few more details about the performance
 tests. For example:
 1. How do the testing records look like? The size and key distributions.
 2. The resources for each task.
 3. The intended configuration for the jobs.
 4. What exact source and sink it would use.
 
 Thanks,
 
 Jiangjie (Becket) Qin
 
 On Mon, Nov 18, 2019 at 7:25 PM Zhijiang >>> .invalid>
 wrote:
 
> +1 (binding)!
> 
> It is a good thing to enhance our testing work.
> 
> Best,
> Zhijiang
> 
> 
> --
> From:Hequn Cheng 
> Send Time:2019 Nov. 18 (Mon.) 18:22
> To:dev 
> Subject:Re: [VOTE] FLIP-83: Flink End-to-end Performance Testing
 Framework
> 
> +1 (binding)!
> I think this would be very helpful to detect regression problems.
> 
> Best, Hequn
> 
> On Mon, Nov 18, 2019 at 4:28 PM vino yang 
>>> wrote:
> 
>> +1 (non-binding)
>> 
>> Best,
>> Vino
>> 
>> jincheng sun  于2019年11月18日周一 下午2:31写道：
>> 
>>> +1  (binding)
>>> 
>>> OpenInx  于2019年11月18日周一 下午12:09写道：
>>> 
 +1  (non-binding)
 
 On Mon, Nov 18, 2019 at 11:54 AM aihua li >>> 
>> wrote:
 
> +1  (non-binding)
> 
> Thanks Yu Li for driving on this.
> 
>> 在 2019年11月15日，下午8:10，Yu Li  写道：
>> 
>> Hi All,
>> 
>> I would like to start the vote for FLIP-83 [1] which is
 discussed
>> and
>> reached consensus in the discussion thread [2].
>> 
>> The vote will be open for at least 72 hours (excluding
 weekend).
>> I'll
 try
>> to close it by 2019-11-20 21:00 CST, unless there is an
 objection
>> or
 not
>> enough votes.
>> 
>> [1]
>> 
> 
 
>>> 
>> 
> 
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-83%3A+Flink+End-to-end+Performance+Testing+Framework
>> [2] https://s.apache.org/7fqrz
>> 
>> Best Regards,
>> Yu
> 
> 
 
>>> 
>> 
> 
> 
 
>>> 
>>> 
>>> --
>>> Best, Jingsong Lee
>>> 
>

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-11-20 Thread Dian Fu

Hi all,

There are no new feedbacks and it seems that we have received enough feedback 
about setup a secur...@flink.apache.org mailing list[1] for security report and 
discussion. It shows that it's optional as we can use either 
secur...@flink.apache.org <mailto:secur...@flink.apache.org> or 
secur...@apache.org. So I'd like to start the vote for setup a 
secur...@flink.apache.org mailing list to make the final decision.

Thanks,
Dian

> 在 2019年11月19日，下午6:06，Dian Fu  写道：
> 
> Hi all,
> 
> Thanks for sharing your thoughts. Appreciated! Let me try to summarize the 
> information and thoughts received so far. Please feel free to let me know if 
> there is anything wrong or missing.
> 
> 1. Setup project specific security mailing list
> Pros:
> - The security reports received by secur...@apache.org 
> <mailto:secur...@apache.org> will be forwarded to the project private(PMC) 
> mailing list. Having a project specific security mailing list is helpful in 
> cases when the best person to address the security issue is not a PMC member, 
> but a committer. It makes things simple as everyone(both PMCs and committers) 
> is on the same table.
> - Even though the security issues are usually rare, they could be devastating 
> and thus need to be treated seriously.
> - Most notable apache projects such as apache common, hadoop, spark, kafka, 
> hive, etc have a security specific mailing list.
> 
> Cons:
> - The ASF security mailing list secur...@apache.org 
> <mailto:secur...@apache.org> could be used if there is no project specific 
> security mailing list.
> - The number of security reports is very low.
> 
> Additional information:
> - Security mailing list could only be subscribed by PMCs and committers. 
> However everyone could report security issues to the security mailing list.
> 
> 
> 2. Guide users to report the security issues
> Why:
> - Security vulnerabilities should not be publicly disclosed (e.g. via dev ML 
> or JIRA) until the project has responded. We should guide users on how to 
> report security issues in Flink website.
> 
> How:
> - Option 1: Set up secur...@flink.apache.org 
> <mailto:secur...@flink.apache.org> and ask users to report security issues 
> there
> - Option 2: Ask users to send security report to secur...@apache.org 
> <mailto:secur...@apache.org>
> - Option 3: Ask users to send security report directly to 
> priv...@flink.apache.org <mailto:priv...@flink.apache.org>
> 
> 
> 3. Dedicated page to show the security vulnerabilities
> - We may need a dedicated security page to describe the CVE list on the Flink 
> website.
> 
> I think it makes sense to open separate discussion thread on 2) and 3). I'll 
> create separate discussion thread for them. Let's focus on 1) in this thread. 
> 
> If there is no other feedback on 1), I'll bring up a VOTE for this discussion.
> 
> What do you think?
> 
> Thanks,
> Dian
> 
> On Fri, Nov 15, 2019 at 10:18 AM Becket Qin  <mailto:becket@gmail.com>> wrote:
> Thanks for bringing this up, Dian.
> 
> +1 on creating a project specific security mailing list. My two cents, I
> think it is worth doing in practice.
> 
> Although the ASF security ML is always available, usually all the emails
> are simply routed to the individual project PMC. This is an additional hop.
> And in some cases, the best person to address the reported issue may not be
> a PMC member, but a committer, so the PMC have to again involve them into
> the loop. This make things unnecessarily complicated. Having a project
> specific security ML would make it much easier to have everyone at the same
> table.
> 
> Also, one thing to note is that even though the security issues are usually
> rare, they could be devastating, thus need to be treated seriously. So I
> think it is a good idea to establish the handling mechanism regardless of
> the frequency of the reported security vulnerabilities.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Fri, Nov 15, 2019 at 1:14 AM Yu Li  <mailto:car...@gmail.com>> wrote:
> 
> > Thanks for bringing up this discussion Dian! How to report security bugs to
> > our project is a very important topic!
> >
> > Big +1 on adding some explicit instructions in our document about how to
> > report security issues, and I suggest to open another thread to vote the
> > reporting way in Flink.
> >
> > FWIW, known options to report security issues include:
> > 1. Set up secur...@flink.apache.org <mailto:secur...@flink.apache.org> and 
> > ask users to report security
> > issues
> > there
> > 2. Ask users to send security report to secur...@apache.o

[VOTE] Setup a secur...@flink.apache.org mailing list

2019-11-21 Thread Dian Fu

Hi all,

According to our previous discussion in [1], I'd like to bring up a vote to set 
up a secur...@flink.apache.org mailing list.

The vote will be open for at least 72 hours (excluding weekend). I'll try to 
close it by 2019-11-26 18:00 UTC, unless there is an objection or not enough 
votes.

Regards,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951

Re: [VOTE] Release flink-shaded 9.0, release candidate #1

2019-11-21 Thread Dian Fu

+1 (non-binding)

- verified the signature and checksum
- checked the maven central artifices

Regards,
Dian



On Wed, Nov 20, 2019 at 8:47 PM tison  wrote:

> +1 (non-binding)
>
> Best,
> tison.
>
>
> Aljoscha Krettek  于2019年11月20日周三 下午6:58写道：
>
> > +1 (binding)
> >
> > Best,
> > Aljoscha
> >
> > > On 19. Nov 2019, at 23:13, Chesnay Schepler 
> wrote:
> > >
> > > Hi everyone,
> > > Please review and vote on the release candidate #1 for the version 9.0,
> > as follows:
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific comments)
> > >
> > >
> > > The complete staging area is available for your review, which includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to dist.apache.org
> > [2], which are signed with the key with fingerprint 11D464BA [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "release-9.0-rc1" [5],
> > > * website pull request listing the new release [6].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > Chesnay
> > >
> > > [1] https://issues.apache.org/jira/projects/FLINK/versions/12346089
> > > [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-9.0-rc1/
> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > [4]
> > https://repository.apache.org/content/repositories/orgapacheflink-1275
> > > [5] https://github.com/apache/flink-shaded/tree/release-9.0-rc1
> > > [6] https://github.com/apache/flink-web/pull/283
> > >
> >
> >
>

Re: [DISCUSS] Remove old WebUI

2019-11-24 Thread Dian Fu

+1 to drop the old UI.

> 在 2019年11月25日，上午10:59，Zhenghua Gao  写道：
> 
> +1 to drop the old one.
> 
> *Best Regards,*
> *Zhenghua Gao*
> 
> 
> On Thu, Nov 21, 2019 at 8:05 PM Chesnay Schepler  wrote:
> 
>> Hello everyone,
>> 
>> Flink 1.9 shipped with a new UI, with the old one being kept around as a
>> backup in case something wasn't working as expected.
>> 
>> Currently there are no issues indicating any significant problems
>> (exclusive to the new UI), so I wanted to check what people think about
>> dropping the old UI for 1.10.
>> 
>>

Re: [ANNOUNCE] Apache Flink-shaded 9.0 released

2019-11-24 Thread Dian Fu

Thanks Chesnay for the great work and thanks to everyone who has contributed to 
this release.

Regards,
Dian

> 在 2019年11月25日，上午10:22，Zhu Zhu  写道：
> 
> Thanks a lot to Chesnay for the great work to release Flink-shaded 9.0!
> And thanks for the efforts to make this release possible to all the
> contributors!
> 
> Thanks,
> Zhu Zhu
> 
> Hequn Cheng  于2019年11月25日周一 上午9:51写道：
> 
>> Thank you Chesnay for the great work!
>> Also thanks a lot to the people who made this release possible!
>> 
>> Best, Hequn
>> 
>> On Mon, Nov 25, 2019 at 12:53 AM Chesnay Schepler 
>> wrote:
>> 
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink-shaded 9.0.
>>> 
>>> The flink-shaded project contains a number of shaded dependencies for
>>> Apache Flink.
>>> 
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data
>>> streaming applications.
>>> 
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>> 
>>> The full release notes are available in Jira:
>>> 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346089
>>> 
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>> 
>>> Regards,
>>> Chesnay
>>> 
>>> 
>>

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-11-25 Thread Dian Fu

NOTE: Only PMC votes is binding.

Thanks for sharing your thoughts. I also think that this doesn't fall into any 
of the existing categories listed in the bylaws. Maybe we could do some 
improvements for the bylaws.

This is not codebase change as Robert mentioned and it's related to how to 
manage Flink's development in a good way. So, I agree with Robert and Jincheng 
that this VOTE should only count PMC votes for now. 

Thanks,
Dian

> 在 2019年11月26日，上午11:43，jincheng sun  写道：
> 
> I also think that we should only count PMC votes.
> 
> This ML is to improve the security mechanism for Flink. Of course we don't
> expect to use this
> ML often. I hope that it's perfect if this ML is never used. However, the
> Flink community is growing rapidly, it's better to
> make our security mechanism as convenient as possible. But I agree that
> this ML is not a must to have, it's nice to have.
> 
> So, I give the vote as +1(binding).
> 
> Best,
> Jincheng
> 
> Robert Metzger  于2019年11月25日周一 下午9:45写道：
> 
>> I agree that we are only counting PMC votes (because this decision goes
>> beyond the codebase)
>> 
>> I'm undecided what to vote :) I'm not against setting up a new mailing
>> list, but I also don't think the benefit (having a private list with PMC +
>> committers) is enough to justify the work involved. As far as I remember,
>> we have received 2 security issue notices, both basically about the same
>> issue.  I'll leave it to other PMC members to support this if they want to
>> ...
>> 
>> 
>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi all,
>>> 
>>> What is the voting scheme for it? I am not sure if it falls into any of
>>> the categories we have listed in our bylaws. Are committers votes
>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this a
>>> binding vote or just an informational vote?
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>> +1
>>>> 
>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> According to our previous discussion in [1], I'd like to bring up a
>> vote
>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>> 
>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>> try
>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>> not
>>>>> enough votes.
>>>>> 
>>>>> Regards,
>>>>> Dian
>>>>> 
>>>>> [1]
>>>>> 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>> 
>>> 
>>

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-11-25 Thread Dian Fu

Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu offline and 
also on the design doc.

It seems that we have reached consensus on the design. I would bring up the 
VOTE if there is no other feedbacks.

Thanks,
Dian

> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
> 
> Thanks a lot for putting this together, Dian! Definitely +1 for this!
> It is great to make sure that the resources used by the Python process are
> managed properly by Flink’s resource management framework.
> 
> Also, thanks to the guys that working on the unified memory management
> framework.
> 
> Best, Hequn
> 
> 
> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
> 
>> Thanks for driving this discussion, Dian!
>> 
>> +1 for this proposal. It will help to reduce container failure due to
>> the memory overuse.
>> Some comments left in the design doc.
>> 
>> Best,
>> Yangze Guo
>> 
>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song 
>> wrote:
>>> 
>>> Sorry for the late reply.
>>> 
>>> +1 for the general proposal.
>>> 
>>> And one remainder, to use UNKNOWN resource requirement, we need to make
>>> sure optimizer knowns which operators use off-heap managed memory, and
>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>> details, and I would suggest you to double check with @Zhu Zhu who works
>> on
>>> this part.
>>> 
>>> Thank you~
>>> 
>>> Xintong Song
>>> 
>>> 
>>> [1]
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>> 
>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu  wrote:
>>> 
>>>> Hi Jincheng,
>>>> 
>>>> Thanks for the reply and also looking forward to the feedback from the
>>>> community.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月11日，下午2:34，jincheng sun  写道：
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> +1， Thanks for bring up this discussion Dian!
>>>>> 
>>>>> The Resource Management is very important for PyFlink UDF. So, It's
>> great
>>>>> if anyone can add more comments or inputs in the design doc or
>> feedback
>>>> in
>>>>> ML. :)
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> Dian Fu  于2019年11月5日周二 上午11:32写道：
>>>>> 
>>>>>> Hi everyone,
>>>>>> 
>>>>>> In FLIP-58[1] it will add the support of Python user-defined
>> stateless
>>>>>> function for Python Table API. It will launch a separate Python
>> process
>>>> for
>>>>>> Python user-defined function execution. The resources used by the
>> Python
>>>>>> process should be managed properly by Flink’s resource management
>>>>>> framework. FLIP-49[2] has proposed a unified memory management
>> framework
>>>>>> and PyFlink user-defined function resource management should be
>> based on
>>>>>> it. Jincheng, Hequn, Xintong, GuoWei and I discussed offline about
>>>> this. I
>>>>>> draft a design doc[3] and want to start a discussion about PyFlink
>>>>>> user-defined function resource management.
>>>>>> 
>>>>>> Welcome any comments on the design doc or giving us feedback on the
>> ML
>>>>>> directly.
>>>>>> 
>>>>>> Regards,
>>>>>> Dian
>>>>>> 
>>>>>> [1]
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>> [2]
>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
>>>>>> [3]
>>>>>> 
>>>> 
>> https://docs.google.com/document/d/1LQP8L66Thu2yVv6YRSfmF9EkkMnwhBHGjcTQ11GUmFc/edit#heading=h.4q4ggaftf78m
>>>> 
>>>> 
>>

Re: [VOTE] Release 1.8.3, release candidate #1

2019-11-28 Thread Dian Fu

Hi Hequn,

Thanks a lot for the great work!

I found the following minor issues:
1. The announcement PR for website has one minor issue. (Have left comments on 
the PR)
2. The following JIRAs are included 1.8.3, but the fix version are not updated 
and so not reflected in the release note:
   https://issues.apache.org/jira/browse/FLINK-14235 
(https://github.com/apache/flink/tree/e0387a8007707ab29795e3aa3794ad279eaaeaf9)
   https://issues.apache.org/jira/browse/FLINK-14370 
(https://github.com/apache/flink/tree/cf7509b91888e4c6b64eb514fbb62af49533e0f0)
3. The following JIRA is not included in 1.8.3, but the fix version is marked 
as fixed in 1.8.3 and so was included in the release note:
   https://issues.apache.org/jira/browse/FLINK-10377

Regards,
Dian

> 在 2019年11月28日，下午10:22，Hequn Cheng  写道：
> 
> Hi everyone,
> 
> Please review and vote on the release candidate #1 for the version 1.8.3,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.8.3-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
> 
> The vote will be open for at least 72 hours.
> Please cast your votes before *Dec. 3rd 2019, 16:00 UTC*.
> 
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Hequn
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346112
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1310
> [5]
> https://github.com/apache/flink/commit/cf92a34e3465d2898eecd2aa7c06d6dc17f60ba3
> [6] https://github.com/apache/flink-web/pull/285

Re: [VOTE] Release 1.8.3, release candidate #1

2019-11-29 Thread Dian Fu

Hi Hequn,

Thanks a lot for the quick update.

+1 (non-binding)

- Verified signatures and checksums
- Build from source without tests with Scala 2.11 and Scala 2.12
- Start local cluster and WebUI is accessible
- Submit wordcount example of both batch and stream, there is no suspicious log 
output
- Run a couple of tests in IDE
- Verified that all POM files point to the right version
- The release note and announcement PR looks good

Regards,
Dian

> 在 2019年11月29日，下午4:20，Hequn Cheng  写道：
> 
> Hi Dian,
> 
> Thanks a lot for the review and valuable feedback! I have addressed the
> JIRA problems and updated the website PR.
> It would be great if you can take another look.
> 
> Best,
> Hequn
> 
> On Fri, Nov 29, 2019 at 2:21 PM Dian Fu  wrote:
> 
>> Hi Hequn,
>> 
>> Thanks a lot for the great work!
>> 
>> I found the following minor issues:
>> 1. The announcement PR for website has one minor issue. (Have left
>> comments on the PR)
>> 2. The following JIRAs are included 1.8.3, but the fix version are not
>> updated and so not reflected in the release note:
>>   https://issues.apache.org/jira/browse/FLINK-14235 (
>> https://github.com/apache/flink/tree/e0387a8007707ab29795e3aa3794ad279eaaeaf9
>> )
>>   https://issues.apache.org/jira/browse/FLINK-14370 (
>> https://github.com/apache/flink/tree/cf7509b91888e4c6b64eb514fbb62af49533e0f0
>> )
>> 3. The following JIRA is not included in 1.8.3, but the fix version is
>> marked as fixed in 1.8.3 and so was included in the release note:
>>   https://issues.apache.org/jira/browse/FLINK-10377
>> 
>> Regards,
>> Dian
>> 
>>> 在 2019年11月28日，下午10:22，Hequn Cheng  写道：
>>> 
>>> Hi everyone,
>>> 
>>> Please review and vote on the release candidate #1 for the version 1.8.3,
>>> as follows:
>>> [ ] +1, Approve the release
>>> [ ] -1, Do not approve the release (please provide specific comments)
>>> 
>>> 
>>> The complete staging area is available for your review, which includes:
>>> * JIRA release notes [1],
>>> * the official Apache source release and binary convenience releases to
>> be
>>> deployed to dist.apache.org [2], which are signed with the key with
>>> fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
>>> * all artifacts to be deployed to the Maven Central Repository [4],
>>> * source code tag "release-1.8.3-rc1" [5],
>>> * website pull request listing the new release and adding announcement
>> blog
>>> post [6].
>>> 
>>> The vote will be open for at least 72 hours.
>>> Please cast your votes before *Dec. 3rd 2019, 16:00 UTC*.
>>> 
>>> It is adopted by majority approval, with at least 3 PMC affirmative
>> votes.
>>> 
>>> Thanks,
>>> Hequn
>>> 
>>> [1]
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346112
>>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc1/
>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>>> [4]
>> https://repository.apache.org/content/repositories/orgapacheflink-1310
>>> [5]
>>> 
>> https://github.com/apache/flink/commit/cf92a34e3465d2898eecd2aa7c06d6dc17f60ba3
>>> [6] https://github.com/apache/flink-web/pull/285
>> 
>>

[VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-01 Thread Dian Fu

Hi all,

I'd like to start the vote of FLIP-88 [1] since that we have reached an 
agreement on the design in the discussion thread [2]. 

This vote will be open for at least 72 hours. Unless there is an objection, I 
will try to close it by Dec 5, 2019 08:00 UTC if we have received sufficient 
votes.

Regards,
Dian

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-02 Thread Dian Fu

Hi Jingsong,

Thanks a lot for your comments. Please see my reply inlined below.

> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
> 
> Hi Dian:
> 
> 
> Thanks for your driving. I have some questions:
> 
> 
> - Where should these configurations belong? You have mentioned tableApi/SQL,
> so should in TableConfig?

All Python related configurations are defined in PythonOptions. User could 
configure these configurations via TableConfig.getConfiguration.setXXX for 
Python Table API programs.

> 
> - If just in table/sql, whether it should be called: table.python.,
> because in table, all config options are called table.***.

These configurations are not table specific. They will be used for both Python 
Table API programs and Python DataStream API programs (which is planned to be 
supported in the future). So python.xxx seems more appropriate, what do you 
think?

> - What should table module do? So in CommonPythonCalc, we should read
> options from table config, and set resources to OneInputTransformation?

As described in the design doc, in compilation phase, for batch jobs, the 
required memory of the Python worker will be calculated according to the 
configuration and set as the managed memory for the operator. For stream jobs, 
the resource spec will be unknown(The reason is that currently the resources 
for all the operators in stream jobs are unknown and it doesn’t support to 
configure both known and unknown resources in a single job).

> - Are all buffer.memory off-heap memory? I took a look
> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
> this one a heap queue? So we need heap memory too?

Yes, they are all off-heap memory which is supposed to be used by the Python 
process. The forwardedInputQueue is a buffer used in the Java operator and its 
memory is accounted as the on-heap memory.

Regards,
Dian

> 
> Hope to get your reply.
> 
> 
> Best,
> 
> Jingsong Lee
> 
> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu  wrote:
> 
>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>> offline and also on the design doc.
>> 
>> It seems that we have reached consensus on the design. I would bring up
>> the VOTE if there is no other feedbacks.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
>>> 
>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>> It is great to make sure that the resources used by the Python process
>> are
>>> managed properly by Flink’s resource management framework.
>>> 
>>> Also, thanks to the guys that working on the unified memory management
>>> framework.
>>> 
>>> Best, Hequn
>>> 
>>> 
>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
>>> 
>>>> Thanks for driving this discussion, Dian!
>>>> 
>>>> +1 for this proposal. It will help to reduce container failure due to
>>>> the memory overuse.
>>>> Some comments left in the design doc.
>>>> 
>>>> Best,
>>>> Yangze Guo
>>>> 
>>>> On Mon, Nov 18, 2019 at 4:06 PM Xintong Song 
>>>> wrote:
>>>>> 
>>>>> Sorry for the late reply.
>>>>> 
>>>>> +1 for the general proposal.
>>>>> 
>>>>> And one remainder, to use UNKNOWN resource requirement, we need to make
>>>>> sure optimizer knowns which operators use off-heap managed memory, and
>>>>> compute and set a fraction to the operators. See FLIP-53[1] for more
>>>>> details, and I would suggest you to double check with @Zhu Zhu who
>> works
>>>> on
>>>>> this part.
>>>>> 
>>>>> Thank you~
>>>>> 
>>>>> Xintong Song
>>>>> 
>>>>> 
>>>>> [1]
>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-53%3A+Fine+Grained+Operator+Resource+Management
>>>>> 
>>>>> On Tue, Nov 12, 2019 at 11:53 AM Dian Fu 
>> wrote:
>>>>> 
>>>>>> Hi Jincheng,
>>>>>> 
>>>>>> Thanks for the reply and also looking forward to the feedback from the
>>>>>> community.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月11日，下午2:34，jincheng sun  写道：
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> +1， Thanks for bring up this discussion Dian!

Re: [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-02 Thread Dian Fu

Hi Jingsong,

It's fine. :)  Appreciated the comments! 

I have replied you in the discussion thread as I also think it's better to 
discuss these in the discussion thread.

Thanks,
Dian

> 在 2019年12月2日，下午3:47，Jingsong Li  写道：
> 
> Sorry for bothering your voting.
> Let's discuss in discussion thread.
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 3:32 PM Jingsong Lee  wrote:
> 
>> Hi Dian:
>> 
>> Thanks for your driving. I have some questions:
>> 
>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL, so should in TableConfig?
>> - If just in table/sql, whether it should be called: table.python.,
>> because in table, all config options are called table.***.
>> - What should table module do? So in CommonPythonCalc, we should read
>> options from table config, and set resources to OneInputTransformation?
>> - Are all buffer.memory off-heap memory? I took a look
>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue, is
>> this one a heap queue? So we need heap memory too?
>> 
>> Hope to get your reply.
>> 
>> Best,
>> Jingsong Lee
>> 
>> On Mon, Dec 2, 2019 at 2:34 PM Dian Fu  wrote:
>> 
>>> Hi all,
>>> 
>>> I'd like to start the vote of FLIP-88 [1] since that we have reached an
>>> agreement on the design in the discussion thread [2].
>>> 
>>> This vote will be open for at least 72 hours. Unless there is an
>>> objection, I will try to close it by Dec 5, 2019 08:00 UTC if we have
>>> received sufficient votes.
>>> 
>>> Regards,
>>> Dian
>>> 
>>> [1]
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-88%3A+PyFlink+User-Defined+Function+Resource+Management
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-PyFlink-User-Defined-Function-Resource-Management-tt34631.html
>> 
>> 
>> 
>> --
>> Best, Jingsong Lee
>> 
> 
> 
> -- 
> Best, Jingsong Lee

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Actually I have tried to find out the reason why so many apache projects choose 
to set up a project specific security mailing list in case that the general 
secur...@apache.org mailing list seems working well. Unfortunately, there is no 
open discussions in these projects and there is also no clear 
guideline/standard in the ASF site whether a project should set up such a 
mailing list (The project specific security mailing list seems only an optional 
and we noticed that at the beginning of the discussion). This is also one of 
the main reasons we start such a discussion to see if somebody has more 
thoughts about this.

> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
> 
> Would security@f.a.o work as any other private ML?
> 
> Contrary to what Becket said in the discussion thread, secur...@apache.org is 
> not just "another hop"; it provides guiding material, the security team 
> checks for activity and can be pinged easily as they are cc'd in the initial 
> report.
> 
> I vastly prefer this over a separate mailing list; if these benefits don't 
> apply to security@f.a.o I'm -1 on this.
> 
> On 02/12/2019 02:28, Becket Qin wrote:
>> Thanks for driving this, Dian.
>> 
>> +1 from me, for the reasons I mentioned in the discussion thread.
>> 
>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>> 
>>> NOTE: Only PMC votes is binding.
>>> 
>>> Thanks for sharing your thoughts. I also think that this doesn't fall into
>>> any of the existing categories listed in the bylaws. Maybe we could do some
>>> improvements for the bylaws.
>>> 
>>> This is not codebase change as Robert mentioned and it's related to how to
>>> manage Flink's development in a good way. So, I agree with Robert and
>>> Jincheng that this VOTE should only count PMC votes for now.
>>> 
>>> Thanks,
>>> Dian
>>> 
>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>> 
>>>> I also think that we should only count PMC votes.
>>>> 
>>>> This ML is to improve the security mechanism for Flink. Of course we
>>> don't
>>>> expect to use this
>>>> ML often. I hope that it's perfect if this ML is never used. However, the
>>>> Flink community is growing rapidly, it's better to
>>>> make our security mechanism as convenient as possible. But I agree that
>>>> this ML is not a must to have, it's nice to have.
>>>> 
>>>> So, I give the vote as +1(binding).
>>>> 
>>>> Best,
>>>> Jincheng
>>>> 
>>>> Robert Metzger  于2019年11月25日周一 下午9:45写道：
>>>> 
>>>>> I agree that we are only counting PMC votes (because this decision goes
>>>>> beyond the codebase)
>>>>> 
>>>>> I'm undecided what to vote :) I'm not against setting up a new mailing
>>>>> list, but I also don't think the benefit (having a private list with
>>> PMC +
>>>>> committers) is enough to justify the work involved. As far as I
>>> remember,
>>>>> we have received 2 security issue notices, both basically about the same
>>>>> issue.  I'll leave it to other PMC members to support this if they want
>>> to
>>>>> ...
>>>>> 
>>>>> 
>>>>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>>> dwysakow...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> What is the voting scheme for it? I am not sure if it falls into any of
>>>>>> the categories we have listed in our bylaws. Are committers votes
>>>>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
>>> a
>>>>>> binding vote or just an informational vote?
>>>>>> 
>>>>>> Best,
>>>>>> 
>>>>>> Dawid
>>>>>> 
>>>>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>>>>> +1
>>>>>>> 
>>>>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> According to our previous discussion in [1], I'd like to bring up a
>>>>> vote
>>>>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>>>>> 
>>>>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>>>>> try
>>>>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>>>>> not
>>>>>>>> enough votes.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Dian
>>>>>>>> 
>>>>>>>> [1]
>>>>>>>> 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>>>>> 
>>> 
>

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Hi all,

Thanks everyone for participating this vote. As we have received only two +1 
and there is also one -1 for this vote, according to the bylaws, I'm sorry to 
announce that this proposal was rejected. 

Neverthless, I think we can always restart the discussion in the future if we 
see more evidence that such a mailing list is necessary.

Thanks,
Dian


> 在 2019年12月3日，下午4:53，Dian Fu  写道：
> 
> Actually I have tried to find out the reason why so many apache projects 
> choose to set up a project specific security mailing list in case that the 
> general secur...@apache.org mailing list seems working well. Unfortunately, 
> there is no open discussions in these projects and there is also no clear 
> guideline/standard in the ASF site whether a project should set up such a 
> mailing list (The project specific security mailing list seems only an 
> optional and we noticed that at the beginning of the discussion). This is 
> also one of the main reasons we start such a discussion to see if somebody 
> has more thoughts about this.
> 
>> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
>> 
>> Would security@f.a.o work as any other private ML?
>> 
>> Contrary to what Becket said in the discussion thread, secur...@apache.org 
>> is not just "another hop"; it provides guiding material, the security team 
>> checks for activity and can be pinged easily as they are cc'd in the initial 
>> report.
>> 
>> I vastly prefer this over a separate mailing list; if these benefits don't 
>> apply to security@f.a.o I'm -1 on this.
>> 
>> On 02/12/2019 02:28, Becket Qin wrote:
>>> Thanks for driving this, Dian.
>>> 
>>> +1 from me, for the reasons I mentioned in the discussion thread.
>>> 
>>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu  wrote:
>>> 
>>>> NOTE: Only PMC votes is binding.
>>>> 
>>>> Thanks for sharing your thoughts. I also think that this doesn't fall into
>>>> any of the existing categories listed in the bylaws. Maybe we could do some
>>>> improvements for the bylaws.
>>>> 
>>>> This is not codebase change as Robert mentioned and it's related to how to
>>>> manage Flink's development in a good way. So, I agree with Robert and
>>>> Jincheng that this VOTE should only count PMC votes for now.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>>> 
>>>>> I also think that we should only count PMC votes.
>>>>> 
>>>>> This ML is to improve the security mechanism for Flink. Of course we
>>>> don't
>>>>> expect to use this
>>>>> ML often. I hope that it's perfect if this ML is never used. However, the
>>>>> Flink community is growing rapidly, it's better to
>>>>> make our security mechanism as convenient as possible. But I agree that
>>>>> this ML is not a must to have, it's nice to have.
>>>>> 
>>>>> So, I give the vote as +1(binding).
>>>>> 
>>>>> Best,
>>>>> Jincheng
>>>>> 
>>>>> Robert Metzger  于2019年11月25日周一 下午9:45写道：
>>>>> 
>>>>>> I agree that we are only counting PMC votes (because this decision goes
>>>>>> beyond the codebase)
>>>>>> 
>>>>>> I'm undecided what to vote :) I'm not against setting up a new mailing
>>>>>> list, but I also don't think the benefit (having a private list with
>>>> PMC +
>>>>>> committers) is enough to justify the work involved. As far as I
>>>> remember,
>>>>>> we have received 2 security issue notices, both basically about the same
>>>>>> issue.  I'll leave it to other PMC members to support this if they want
>>>> to
>>>>>> ...
>>>>>> 
>>>>>> 
>>>>>> On Mon, Nov 25, 2019 at 9:15 AM Dawid Wysakowicz <
>>>> dwysakow...@apache.org>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> What is the voting scheme for it? I am not sure if it falls into any of
>>>>>>> the categories we have listed in our bylaws. Are committers votes
>>>>>>> binding or just PMCs'? (Personally I think it should be PMCs') Is this
>>>> a
>>>>>>> binding vote or just an informational vote?
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dawid
>>>>>>> 
>>>>>>> On 25/11/2019 07:34, jincheng sun wrote:
>>>>>>>> +1
>>>>>>>> 
>>>>>>>> Dian Fu  于2019年11月21日周四 下午4:11写道：
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> According to our previous discussion in [1], I'd like to bring up a
>>>>>> vote
>>>>>>>>> to set up a secur...@flink.apache.org mailing list.
>>>>>>>>> 
>>>>>>>>> The vote will be open for at least 72 hours (excluding weekend). I'll
>>>>>>> try
>>>>>>>>> to close it by 2019-11-26 18:00 UTC, unless there is an objection or
>>>>>> not
>>>>>>>>> enough votes.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Dian
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
>>>>>>> 
>>>> 
>> 
>

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-03 Thread Dian Fu

Hi Jingsong,

Thanks for your valuable feedback. I have updated the "Example" section 
describing how to use these options in a Python Table API program.

Thanks,
Dian

> 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
> 
> Hi Dian:
> 
> Thanks for you explanation.
> If you can update the document to add explanation for the changes to the
> table layer,
> it might be better. (it's just a suggestion, it depends on you)
> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
> Will this queue take up a lot of memory?
> Can it also occupy memory as large as buffer.memory?
> If so, what we're dealing with now is the silent use of heap memory?
> I feel a little strange, because the memory on the python side will reserve,
> but the memory on the JVM side is used silently.
> 
> After carefully seeing your comments on Google doc:
>> The memory used by the Java operator is currently accounted as the task
> on-heap memory. We can revisit this if we find it's a problem in the future.
> I agree that we can ignore it now, But we can add some content to the
> document to remind the user, What do you think?
> 
> Best,
> Jingsong Lee
> 
> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks a lot for your comments. Please see my reply inlined below.
>> 
>>> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
>>> 
>>> Hi Dian:
>>> 
>>> 
>>> Thanks for your driving. I have some questions:
>>> 
>>> 
>>> - Where should these configurations belong? You have mentioned
>> tableApi/SQL,
>>> so should in TableConfig?
>> 
>> All Python related configurations are defined in PythonOptions. User could
>> configure these configurations via TableConfig.getConfiguration.setXXX for
>> Python Table API programs.
>> 
>>> 
>>> - If just in table/sql, whether it should be called: table.python.,
>>> because in table, all config options are called table.***.
>> 
>> These configurations are not table specific. They will be used for both
>> Python Table API programs and Python DataStream API programs (which is
>> planned to be supported in the future). So python.xxx seems more
>> appropriate, what do you think?
>> 
>>> - What should table module do? So in CommonPythonCalc, we should read
>>> options from table config, and set resources to OneInputTransformation?
>> 
>> As described in the design doc, in compilation phase, for batch jobs, the
>> required memory of the Python worker will be calculated according to the
>> configuration and set as the managed memory for the operator. For stream
>> jobs, the resource spec will be unknown(The reason is that currently the
>> resources for all the operators in stream jobs are unknown and it doesn’t
>> support to configure both known and unknown resources in a single job).
>> 
>>> - Are all buffer.memory off-heap memory? I took a look
>>> to AbstractPythonScalarFunctionOperator, there is a forwardedInputQueue,
>> is
>>> this one a heap queue? So we need heap memory too?
>> 
>> Yes, they are all off-heap memory which is supposed to be used by the
>> Python process. The forwardedInputQueue is a buffer used in the Java
>> operator and its memory is accounted as the on-heap memory.
>> 
>> Regards,
>> Dian
>> 
>>> 
>>> Hope to get your reply.
>>> 
>>> 
>>> Best,
>>> 
>>> Jingsong Lee
>>> 
>>> On Tue, Nov 26, 2019 at 12:17 PM Dian Fu  wrote:
>>> 
>>>> Thanks for your votes and feedbacks. I have discussed with @Zhu Zhu
>>>> offline and also on the design doc.
>>>> 
>>>> It seems that we have reached consensus on the design. I would bring up
>>>> the VOTE if there is no other feedbacks.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年11月22日，下午2:51，Hequn Cheng  写道：
>>>>> 
>>>>> Thanks a lot for putting this together, Dian! Definitely +1 for this!
>>>>> It is great to make sure that the resources used by the Python process
>>>> are
>>>>> managed properly by Flink’s resource management framework.
>>>>> 
>>>>> Also, thanks to the guys that working on the unified memory management
>>>>> framework.
>>>>> 
>>>>> Best, Hequn
>>>>> 
>>>>> 
>>>>> On Mon, Nov 18, 2019 at 5:23 PM Yangze Guo  wrote:
>>>>> 
>>>>>> Th

Re: [DISCUSS] Drop RequiredParameters and OptionType

2019-12-03 Thread Dian Fu

+1 to remove them. It seems that we should also drop the class Option as it's 
currently only used in RequiredParameters.

> 在 2019年12月3日，下午8:34，Robert Metzger  写道：
> 
> +1 on removing it.
> 
> On Tue, Dec 3, 2019 at 12:31 PM Stephan Ewen  > wrote:
> I just stumbled across these classes recently and was looking for sample uses.
> No examples and other tests in the code base seem to use RequiredParameters 
> and OptionType.
> 
> They also seem quite redundant with how ParameterTool itself works 
> (tool.getRequired()).
> 
> Should we drop them, in an attempt to reduce unnecessary code and confusion 
> for users (multiple ways to do the same thing)? There are also many better 
> command line parsing libraries out there, this seems like something we don't 
> need to solve in Flink.
> 
> Best,
> Stephan

Re: [DISCUSS] Expose or setup a secur...@flink.apache.org mailing list for security report and discussion

2019-12-03 Thread Dian Fu

Hi all,

Just sync the results of the vote for setup a mailing list security@f.a.o
that it has been rejected [1].

Another very important thing is that all the people agree that there should
be a guideline on how to report security issues in Flink website. Do you
think we should bring up a separate discussion/vote thread? If so, I will
do that. Personally I think that discussing on the PR is enough. What do
you think?

I have created a PR [2]. Appreciate if you can take a look at.

Regards,
Dian

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Setup-a-security-flink-apache-org-mailing-list-tt35205.html
[2] https://github.com/apache/flink-web/pull/287

On Thu, Nov 21, 2019 at 3:58 PM Dian Fu  wrote:

> Hi all,
>
> There are no new feedbacks and it seems that we have received enough
> feedback about setup a secur...@flink.apache.org mailing list[1] for
> security report and discussion. It shows that it's optional as we can use
> either secur...@flink.apache.org or secur...@apache.org. So I'd like to
> start the vote for setup a secur...@flink.apache.org mailing list to make
> the final decision.
>
> Thanks,
> Dian
>
> 在 2019年11月19日，下午6:06，Dian Fu  写道：
>
> Hi all,
>
> Thanks for sharing your thoughts. Appreciated! Let me try to summarize the
> information and thoughts received so far. Please feel free to let me know
> if there is anything wrong or missing.
>
> 1. Setup project specific security mailing list
> Pros:
> - The security reports received by secur...@apache.org will be forwarded
> to the project private(PMC) mailing list. Having a project specific
> security mailing list is helpful in cases when the best person to address
> the security issue is not a PMC member, but a committer. It makes things
> simple as everyone(both PMCs and committers) is on the same table.
> - Even though the security issues are usually rare, they could be
> devastating and thus need to be treated seriously.
> - Most notable apache projects such as apache common, hadoop, spark,
> kafka, hive, etc have a security specific mailing list.
>
> Cons:
> - The ASF security mailing list secur...@apache.org could be used if
> there is no project specific security mailing list.
> - The number of security reports is very low.
>
> Additional information:
> - Security mailing list could only be subscribed by PMCs and committers.
> However everyone could report security issues to the security mailing list.
>
>
> 2. Guide users to report the security issues
> Why:
> - Security vulnerabilities should not be publicly disclosed (e.g. via dev
> ML or JIRA) until the project has responded. We should guide users on how
> to report security issues in Flink website.
>
> How:
> - Option 1: Set up secur...@flink.apache.org and ask users to report
> security issues there
> - Option 2: Ask users to send security report to secur...@apache.org
> - Option 3: Ask users to send security report directly to
> priv...@flink.apache.org
>
>
> 3. Dedicated page to show the security vulnerabilities
> - We may need a dedicated security page to describe the CVE list on the
> Flink website.
>
> I think it makes sense to open separate discussion thread on 2) and 3).
> I'll create separate discussion thread for them. Let's focus on 1) in this
> thread.
>
> If there is no other feedback on 1), I'll bring up a VOTE for this
> discussion.
>
> What do you think?
>
> Thanks,
> Dian
>
> On Fri, Nov 15, 2019 at 10:18 AM Becket Qin  wrote:
>
>> Thanks for bringing this up, Dian.
>>
>> +1 on creating a project specific security mailing list. My two cents, I
>> think it is worth doing in practice.
>>
>> Although the ASF security ML is always available, usually all the emails
>> are simply routed to the individual project PMC. This is an additional
>> hop.
>> And in some cases, the best person to address the reported issue may not
>> be
>> a PMC member, but a committer, so the PMC have to again involve them into
>> the loop. This make things unnecessarily complicated. Having a project
>> specific security ML would make it much easier to have everyone at the
>> same
>> table.
>>
>> Also, one thing to note is that even though the security issues are
>> usually
>> rare, they could be devastating, thus need to be treated seriously. So I
>> think it is a good idea to establish the handling mechanism regardless of
>> the frequency of the reported security vulnerabilities.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, Nov 15, 2019 at 1:14 AM Yu Li  wrote:
>>
>> > Thanks for bringing up this discussion Dian! How to report security
>

Re: [VOTE] Setup a secur...@flink.apache.org mailing list

2019-12-03 Thread Dian Fu

Hi Becket,

Thanks for the kind remind. Definitely agree with you. I have updated the 
progress of this vote on the discussion thread[1] and submitted a PR which 
updates the flink website on how to report security issues.

Thanks,
Dian

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951
 
<http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Expose-or-setup-a-security-flink-apache-org-mailing-list-for-security-report-and-discussion-tt34950.html#a34951>
> 在 2019年12月4日，上午7:29，Becket Qin  写道：
> 
> Hi Dian,
> 
> Thanks for driving the effort regardless.
> 
> Even if we don't setup a security@f.a.o ML for Flink, we probably should
> have a clear pointer to the ASF guideline and secur...@apache.org in the
> project website. I think many people are not aware of the
> secur...@apache.org address. If they failed to find information in the
> Flink site, they will simply assume there is no special procedure for
> security problems.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Tue, Dec 3, 2019 at 4:54 PM Dian Fu  wrote:
> 
>> Hi all,
>> 
>> Thanks everyone for participating this vote. As we have received only two
>> +1 and there is also one -1 for this vote, according to the bylaws, I'm
>> sorry to announce that this proposal was rejected.
>> 
>> Neverthless, I think we can always restart the discussion in the future if
>> we see more evidence that such a mailing list is necessary.
>> 
>> Thanks,
>> Dian
>> 
>> 
>>> 在 2019年12月3日，下午4:53，Dian Fu  写道：
>>> 
>>> Actually I have tried to find out the reason why so many apache projects
>> choose to set up a project specific security mailing list in case that the
>> general secur...@apache.org mailing list seems working well.
>> Unfortunately, there is no open discussions in these projects and there is
>> also no clear guideline/standard in the ASF site whether a project should
>> set up such a mailing list (The project specific security mailing list
>> seems only an optional and we noticed that at the beginning of the
>> discussion). This is also one of the main reasons we start such a
>> discussion to see if somebody has more thoughts about this.
>>> 
>>>> 在 2019年12月2日，下午6:03，Chesnay Schepler  写道：
>>>> 
>>>> Would security@f.a.o work as any other private ML?
>>>> 
>>>> Contrary to what Becket said in the discussion thread,
>> secur...@apache.org is not just "another hop"; it provides guiding
>> material, the security team checks for activity and can be pinged easily as
>> they are cc'd in the initial report.
>>>> 
>>>> I vastly prefer this over a separate mailing list; if these benefits
>> don't apply to security@f.a.o I'm -1 on this.
>>>> 
>>>> On 02/12/2019 02:28, Becket Qin wrote:
>>>>> Thanks for driving this, Dian.
>>>>> 
>>>>> +1 from me, for the reasons I mentioned in the discussion thread.
>>>>> 
>>>>> On Tue, Nov 26, 2019 at 12:08 PM Dian Fu 
>> wrote:
>>>>> 
>>>>>> NOTE: Only PMC votes is binding.
>>>>>> 
>>>>>> Thanks for sharing your thoughts. I also think that this doesn't fall
>> into
>>>>>> any of the existing categories listed in the bylaws. Maybe we could
>> do some
>>>>>> improvements for the bylaws.
>>>>>> 
>>>>>> This is not codebase change as Robert mentioned and it's related to
>> how to
>>>>>> manage Flink's development in a good way. So, I agree with Robert and
>>>>>> Jincheng that this VOTE should only count PMC votes for now.
>>>>>> 
>>>>>> Thanks,
>>>>>> Dian
>>>>>> 
>>>>>>> 在 2019年11月26日，上午11:43，jincheng sun  写道：
>>>>>>> 
>>>>>>> I also think that we should only count PMC votes.
>>>>>>> 
>>>>>>> This ML is to improve the security mechanism for Flink. Of course we
>>>>>> don't
>>>>>>> expect to use this
>>>>>>> ML often. I hope that it's perfect if this ML is never used.
>> However, the
>>>>>>> Flink community is growing rapidly, it's better to
>>>>>>> make our security mechanism as convenient as possible. But I agree
>&g

Re: [DISCUSS] Voting from apache.org addresses

2019-12-03 Thread Dian Fu

Thanks Dawid for start this discussion.

I have the same feeling with Xuefu and Jingsong. Besides that, according to the 
bylaws, for some kinds of votes, only the votes from active PMC members are 
binding, such as product release. So an email address doesn't help here. Even 
if a vote is from a Flink committer, it is still non-binding.

Thanks,
Dian

> 在 2019年12月4日，上午10:37，Jingsong Lee  写道：
> 
> Thanks Dawid for driving this discussion.
> 
> +1 to Xuefu's viewpoint.
> I am not a Flink committer, but sometimes I use apache email address to
> send email.
> 
> Another way is that we require the binding ticket to must contain "binding".
> Otherwise it must be a "non-binding" ticket.
> In this way, we can let lazy people continue voting without any suffix too.
> 
> Best,
> Jingsong Lee
> 
> On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:
> 
>> Hi Dawid,
>> 
>> Thanks for initiating this discussion. I understand the problem you
>> described, but the solution might not work as having an apache.org email
>> address doesn't necessary mean it's from a Flink committer. This certainly
>> applies to me.
>> 
>> It probably helps for the voters to identify themselves by specifying
>> either "binding" or "non-binding", though I understand this cannot be
>> enforced but serves a general guideline.
>> 
>> Thanks,
>> Xuefu
>> 
>> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi,
>>> 
>>> I wanted to reach out primarily to the Flink's committers. I think
>>> whenever we cast a vote on a proposal, is it a FLIP, release candidate
>>> or any other proposal, we should use our apache.org email address.
>>> 
>>> It is not an easy task to check if a person voting is a committer/PMC if
>>> we do not work with him/her on a daily basis. This is important for
>>> verifying if a vote is binding or not.
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> 
>>> 
>> 
>> --
>> Xuefu Zhang
>> 
>> "In Honey We Trust!"
>> 
> 
> 
> -- 
> Best, Jingsong Lee

Re: [DISCUSS] Voting from apache.org addresses

2019-12-04 Thread Dian Fu

Thanks for your explanation Dawid! It makes sense to me now. +1.

Just one minor question: Does this mean that if a committer/PMC accidentally 
votes using the non apache email, even if the person who summarizes the votes 
clearly KNOWS who he/she is, that vote will still be counted as non-binding?

Regards,
Dian

> 在 2019年12月4日，下午5:13，Aljoscha Krettek  写道：
> 
> Very sensible! +1
> 
>> On 4. Dec 2019, at 10:02, Chesnay Schepler  wrote:
>> 
>> I believe this to be a sensible approach by Dawid; +1.
>> 
>> On 04/12/2019 09:04, Dawid Wysakowicz wrote:
>>> 
>>> Hi all,
>>> 
>>> Sorry I think I was not clear enough on my initial e-mail. Let me first 
>>> clarify two things and later on try to rephrase my initial suggestion.
>>> 
>>> 1. I do not want to count all votes from @apache.org addresses as binding
>>> 2. I do not want to discourage people that do not have @apache.org
>>>   address from voting
>>> 3. What I said does not change anything for non-committers/non-PMCs
>>> 
>>> What I meant is that if you are a committer/PMC please use an apache.org 
>>> address because then the person that summarizes the votes can check in the 
>>> apache directory if a person with that address is a committer/PMC in flink 
>>> project. Otherwise if a committer uses a different address there is no way 
>>> to check if that person is a committer/PMC or not. It does not mean though 
>>> that if you vote from apache.org this vote is automatically binding. It 
>>> just allows us to check if it is.
>>> 
>>> To elaborate on Xuefu's example. It's absolutely fine for you to use an 
>>> apache address for voting. I will still check if you are a committer or 
>>> not. But take me (or any other committer) for example. If I use my 
>>> non-apache address for a vote and the person verifying the vote does not 
>>> know me and my address, it is not easy for that person to verify if I am a 
>>> committer or not.
>>> 
>>> Also it does not mean that other people are not allowed to vote. You can 
>>> vote from other addresses, but those votes will be counted as non-binding. 
>>> This does not change anything for non-committers/non-PMC. However if you 
>>> are a committer and vote from non apache address your vote will be 
>>> non-binding, because we cannot verify you are indeed a committer (we might 
>>> don't know your other address).
>>> 
>>> I agree the additional information (binding, non-binding) in a vote helps, 
>>> but it still should be verified. People make mistakes.
>>> 
>>> I hope this clears it up a bit.
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 04/12/2019 04:58, Dian Fu wrote:
>>>> Thanks Dawid for start this discussion.
>>>> 
>>>> I have the same feeling with Xuefu and Jingsong. Besides that, according 
>>>> to the bylaws, for some kinds of votes, only the votes from active PMC 
>>>> members are binding, such as product release. So an email address doesn't 
>>>> help here. Even if a vote is from a Flink committer, it is still 
>>>> non-binding.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年12月4日，上午10:37，Jingsong Lee  写道：
>>>>> 
>>>>> Thanks Dawid for driving this discussion.
>>>>> 
>>>>> +1 to Xuefu's viewpoint.
>>>>> I am not a Flink committer, but sometimes I use apache email address to
>>>>> send email.
>>>>> 
>>>>> Another way is that we require the binding ticket to must contain 
>>>>> "binding".
>>>>> Otherwise it must be a "non-binding" ticket.
>>>>> In this way, we can let lazy people continue voting without any suffix 
>>>>> too.
>>>>> 
>>>>> Best,
>>>>> Jingsong Lee
>>>>> 
>>>>> On Wed, Dec 4, 2019 at 3:58 AM Xuefu Z  wrote:
>>>>> 
>>>>>> Hi Dawid,
>>>>>> 
>>>>>> Thanks for initiating this discussion. I understand the problem you
>>>>>> described, but the solution might not work as having an apache.org email
>>>>>> address doesn't necessary mean it's from a Flink committer. This 
>>>>>> certainly
>>>>>> applies to me.
>>>>>> 
>>>>>> It probably helps for the voters to identify themselves by specifying
>>>>>> either "binding" or "non-binding", though I understand this cannot be
>>>>>> enforced but serves a general guideline.
>>>>>> 
>>>>>> Thanks,
>>>>>> Xuefu
>>>>>> 
>>>>>> On Tue, Dec 3, 2019 at 6:15 AM Dawid Wysakowicz
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I wanted to reach out primarily to the Flink's committers. I think
>>>>>>> whenever we cast a vote on a proposal, is it a FLIP, release candidate
>>>>>>> or any other proposal, we should use our apache.org email address.
>>>>>>> 
>>>>>>> It is not an easy task to check if a person voting is a committer/PMC if
>>>>>>> we do not work with him/her on a daily basis. This is important for
>>>>>>> verifying if a vote is binding or not.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dawid
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> --
>>>>>> Xuefu Zhang
>>>>>> 
>>>>>> "In Honey We Trust!"
>>>>>> 
>>>>> -- 
>>>>> Best, Jingsong Lee
>> 
>> 
>

Re: [DISCUSS] Voting from apache.org addresses

2019-12-04 Thread Dian Fu

Hi Dawid,

Thanks for the reply. Counting all the votes from non apache addresses as 
non-binding makes sense. Just as Jark mentioned, we can always remind the 
committer/PMC to vote again to use the apache address if necessary (i.e. when 
the number of binding votes is not enough).

Thanks,
Dian

> 在 2019年12月4日，下午7:27，Kurt Young  写道：
> 
> +1 (from my apache email ;-))
> 
> Best,
> Kurt
> 
> 
> On Wed, Dec 4, 2019 at 7:22 PM Jark Wu  wrote:
> 
>> I'm +1 on this proposal.
>> 
>> Regarding to the case that Dian mentioned, we can reminder the
>> committer/PMC to vote again use the apache email,
>> and of course the non-apache vote is counted as non-binding.
>> 
>> Best,
>> Jark
>> 
>> On Wed, 4 Dec 2019 at 17:33, Dawid Wysakowicz 
>> wrote:
>> 
>>> Hi Dian,
>>> 
>>> I don't want to be very strict, but I think it should be counted as
>>> non-binding, if it comes from non apache address, yes.
>>> 
>>> Anybody should be able to verify a vote. Moreover I think this the only
>>> way to "encourage" all committers to use their apache addresses ;)
>>> 
>>> Best,
>>> 
>>> Dawid
>>> 
>>> On 04/12/2019 10:26, Dian Fu wrote:
>>>> Thanks for your explanation Dawid! It makes sense to me now. +1.
>>>> 
>>>> Just one minor question: Does this mean that if a committer/PMC
>>> accidentally votes using the non apache email, even if the person who
>>> summarizes the votes clearly KNOWS who he/she is, that vote will still be
>>> counted as non-binding?
>>>> 
>>>> Regards,
>>>> Dian
>>>> 
>>>>> 在 2019年12月4日，下午5:13，Aljoscha Krettek  写道：
>>>>> 
>>>>> Very sensible! +1
>>>>> 
>>>>>> On 4. Dec 2019, at 10:02, Chesnay Schepler 
>> wrote:
>>>>>> 
>>>>>> I believe this to be a sensible approach by Dawid; +1.
>>>>>> 
>>>>>> On 04/12/2019 09:04, Dawid Wysakowicz wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> Sorry I think I was not clear enough on my initial e-mail. Let me
>>> first clarify two things and later on try to rephrase my initial
>> suggestion.
>>>>>>> 
>>>>>>> 1. I do not want to count all votes from @apache.org addresses as
>>> binding
>>>>>>> 2. I do not want to discourage people that do not have @apache.org
>>>>>>>  address from voting
>>>>>>> 3. What I said does not change anything for non-committers/non-PMCs
>>>>>>> 
>>>>>>> What I meant is that if you are a committer/PMC please use an
>>> apache.org address because then the person that summarizes the votes can
>>> check in the apache directory if a person with that address is a
>>> committer/PMC in flink project. Otherwise if a committer uses a different
>>> address there is no way to check if that person is a committer/PMC or
>> not.
>>> It does not mean though that if you vote from apache.org this vote is
>>> automatically binding. It just allows us to check if it is.
>>>>>>> 
>>>>>>> To elaborate on Xuefu's example. It's absolutely fine for you to use
>>> an apache address for voting. I will still check if you are a committer
>> or
>>> not. But take me (or any other committer) for example. If I use my
>>> non-apache address for a vote and the person verifying the vote does not
>>> know me and my address, it is not easy for that person to verify if I am
>> a
>>> committer or not.
>>>>>>> 
>>>>>>> Also it does not mean that other people are not allowed to vote. You
>>> can vote from other addresses, but those votes will be counted as
>>> non-binding. This does not change anything for non-committers/non-PMC.
>>> However if you are a committer and vote from non apache address your vote
>>> will be non-binding, because we cannot verify you are indeed a committer
>>> (we might don't know your other address).
>>>>>>> 
>>>>>>> I agree the additional information (binding, non-binding) in a vote
>>> helps, but it still should be verified. People make mistakes.
>>>>>>> 
>>>>>>> I hope this clears it up a bit.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>&

Re: [DISCUSS] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Dian Fu

Thanks for bringing up this discussion Wei. +1 for this proposal!

As these options are proposed in 1.10, it will be great if we can improve them 
in 1.10. Then it will not cause compatible issues.

Thanks,
Dian

> 在 2019年12月5日，上午10:01，jincheng sun  写道：
> 
> Hi all,
> 
> Thanks for the quick response Aljoscha & Wei !
> 
> It seems unify the options is necessary, and 1.10 will be code frozen. I
> would be like to bring up the VOTE thread for this change ASAP, and more
> detail can continue discuss in the PR.
> 
> What do you think?
> 
> Best,
> Jincheng
> 
> Aljoscha Krettek  于2019年12月4日周三 下午5:11写道：
> 
>> Perfect, thanks for the background info! I also found this section now,
>> which mentions that it comes from Hadoop:
>> https://spark.apache.org/docs/latest/running-on-yarn.html#important-notes.
>> 
>> I think the proposed changes are good!
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Dec 2019, at 04:34, Wei Zhong  wrote:
>>> 
>>> Hi Aljoscha,
>>> 
>>> Thanks for your reply! Before bringing up this discussion I did some
>> research on commonly used separators for options that take multiple values.
>> I have considered ",", ":" and "#". Finally I chose "#" as the separator of
>> "--pyRequirements".
>>> 
>>> For ",", it is the most widely used separator. Many projects use it as
>> the separator of the values in same level. e.g. "-Dexcludes" in Maven,
>> "--files" in Spark and "-pyFiles" in Flink. But the second parameter of
>> "--pyRequirements", the requirement cached directory, is not at the same
>> level as its first parameter (the requirements file). It is secondary and
>> is only needed when the packages in the requirements file can not be
>> downloaded from the package index server.
>>> 
>>> For ":", it is used as a path separator in most cases. e.g. main
>> arguments of scp (secure copy), "--volume" in Docker and "-cp" in Java. But
>> as we support accept a URI as the file path, which contains ":" in most
>> cases, ":" can not be used as the separator of "--pyRequirements".
>>> 
>>> For "#", it is really rarely used as a separator for multiple values. I
>> only find Spark using "#" as the separator for option "--files" and
>> "--archives" between file path and target file/directory name. After some
>> research I find that this usage comes from the URI fragment. We can append
>> a secondary resource as the fragment of the URI after a number sign ("#")
>> character. As we treat user file paths as URIs when parsing command line,
>> using "#" as the separator of "--pyRequirements" makes sense to me, which
>> means the second parameter is the fragment of the first parameter. The
>> definition of URI fragment can be found here [1].
>>> 
>>> The reason of using "#" in "--pyArchives" as the separator of file path
>> and targer directory name is the same as above.
>>> 
>>> Best,
>>> Wei
>>> 
>>> [1] https://tools.ietf.org/html/rfc3986#section-3.5
>>> 
 在 2019年12月3日，22:02，Aljoscha Krettek  写道：
 
 Hi,
 
 Yes, I think it’s a good idea to make the options uniform. Using ‘#’ as
>> a separator for options that take two values seems a bit strange to me, did
>> you research if any other CLI tools have this convention?
 
 Side note: I don’t like that our options use camel-case, I think that’s
>> very non-standard. But that’s how it is now…
 
 Best,
 Aljoscha
 
> On 3. Dec 2019, at 10:14, jincheng sun 
>> wrote:
> 
> Thanks for bringup this discussion Wei!
> I think this is very important for Flink User, we should contains this
> changes in Flink 1.10.
> +1  for the optimization from the perspective of user convenience and
>> the
> unified use of Flink command line parameters.
> 
> Best,
> Jincheng
> 
> Wei Zhong  于2019年12月2日周一 下午3:26写道：
> 
>> Hi everyone,
>> 
>> I wanted to bring up the discussion of improving the Pyflink command
>> line
>> options.
>> 
>> A few command line options have been introduced in the FLIP-78 [1],
>> i.e.
>> "python-executable-path", "python-requirements","python-archive", etc.
>> There are a few problems with these options, i.e. the naming style,
>> variable argument options, etc.
>> 
>> We want to make some adjustment of FLIP-78 to improve the newly
>> introduced
>> command line options, here is the design doc:
>> 
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> <
>> 
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>>> 
>> Looking forward to your feedback!
>> 
>> Best,
>> Wei
>> 
>> [1]
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> <
>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78:+Flink+Python+UDF+Environment+and+Dependency+Management
>>> 
>> 
>> 
 
>>> 
>> 
>>

Re: [DISCUSS] Drop Kafka 0.8/0.9

2019-12-04 Thread Dian Fu

+1 for dropping them. 

Just FYI: there was a similar discussion few months ago [1]. 

[1] 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html#a29997
 


> 在 2019年12月5日，上午10:29，vino yang  写道：
> 
> +1
> 
> jincheng sun mailto:sunjincheng...@gmail.com>> 
> 于2019年12月5日周四 上午10:26写道：
> +1  for drop it, and Thanks for bring up this discussion Chesnay!
> 
> Best,
> Jincheng
> 
> Jark Wu mailto:imj...@gmail.com>> 于2019年12月5日周四 上午10:19写道：
> +1 for dropping, also cc'ed user mailing list.
> 
> 
> Best,
> Jark
> 
> On Thu, 5 Dec 2019 at 03:39, Konstantin Knauf  >
> wrote:
> 
> > Hi Chesnay,
> >
> > +1 for dropping. I have not heard from any user using 0.8 or 0.9 for a long
> > while.
> >
> > Cheers,
> >
> > Konstantin
> >
> > On Wed, Dec 4, 2019 at 1:57 PM Chesnay Schepler  > >
> > wrote:
> >
> > > Hello,
> > >
> > > What's everyone's take on dropping the Kafka 0.8/0.9 connectors from the
> > > Flink codebase?
> > >
> > > We haven't touched either of them for the 1.10 release, and it seems
> > > quite unlikely that we will do so in the future.
> > >
> > > We could finally close a number of test stability tickets that have been
> > > lingering for quite a while.
> > >
> > >
> > > Regards,
> > >
> > > Chesnay
> > >
> > >
> >
> > --
> >
> > Konstantin Knauf | Solutions Architect
> >
> > +49 160 91394525
> >
> >
> > Follow us @VervericaData Ververica  > >
> >
> >
> > --
> >
> > Join Flink Forward  > > - The Apache Flink
> > Conference
> >
> > Stream Processing | Event Driven | Real Time
> >
> > --
> >
> > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> >
> > --
> > Ververica GmbH
> > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> > (Tony) Cheng
> >

Re: [VOTE] Improve the Pyflink command line options (Adjustment to FLIP-78)

2019-12-04 Thread Dian Fu

+1 (non-binding)

Regards,
Dian

> 在 2019年12月5日，上午11:11，Jark Wu  写道：
> 
> +1 (binding)
> 
> Best,
> Jark
> 
> On Thu, 5 Dec 2019 at 10:45, Wei Zhong  wrote:
> 
>> Hi all,
>> 
>> According to our previous discussion in [1], I'd like to bring up a vote
>> to apply the adjustment [2] to the command-line option design of FLIP-78
>> [3].
>> 
>> The vote will be open for at least 72 hours unless there is an objection
>> or not enough votes.
>> 
>> Best,
>> Wei
>> 
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-the-Pyflink-command-line-options-Adjustment-to-FLIP-78-td35440.html
>> [2]
>> https://docs.google.com/document/d/1R8CaDa3908V1SnTxBkTBzeisWqBF40NAYYjfRl680eg/edit?usp=sharing
>> [3]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management
>> 
>>

Re: [DISCUSS] PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi Jingsong,

Appreciated for your sharing. It's very helpful as the Python operator will 
take the similar way.

Thanks,
Dian

> 在 2019年12月6日，上午11:12，Jingsong Li  写道：
> 
> Hi Dian,
> 
> After [1] and [2], in the batch sql world, we will:
> - [2] In client/compile side: we use memory weight request memory for
> Transformation.
> - [1] In runtime side: we use memory fraction to compute memory size and
> allocate in StreamOperator.
> For your information.
> 
> [1] https://jira.apache.org/jira/browse/FLINK-14063
> [2] https://jira.apache.org/jira/browse/FLINK-15035
> 
> Best,
> Jingsong Lee
> 
> On Tue, Dec 3, 2019 at 6:07 PM Dian Fu  wrote:
> 
>> Hi Jingsong,
>> 
>> Thanks for your valuable feedback. I have updated the "Example" section
>> describing how to use these options in a Python Table API program.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年12月2日，下午6:12，Jingsong Lee  写道：
>>> 
>>> Hi Dian:
>>> 
>>> Thanks for you explanation.
>>> If you can update the document to add explanation for the changes to the
>>> table layer,
>>> it might be better. (it's just a suggestion, it depends on you)
>>> About forwardedInputQueue in AbstractPythonScalarFunctionOperator,
>>> Will this queue take up a lot of memory?
>>> Can it also occupy memory as large as buffer.memory?
>>> If so, what we're dealing with now is the silent use of heap memory?
>>> I feel a little strange, because the memory on the python side will
>> reserve,
>>> but the memory on the JVM side is used silently.
>>> 
>>> After carefully seeing your comments on Google doc:
>>>> The memory used by the Java operator is currently accounted as the task
>>> on-heap memory. We can revisit this if we find it's a problem in the
>> future.
>>> I agree that we can ignore it now, But we can add some content to the
>>> document to remind the user, What do you think?
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> On Mon, Dec 2, 2019 at 5:17 PM Dian Fu  wrote:
>>> 
>>>> Hi Jingsong,
>>>> 
>>>> Thanks a lot for your comments. Please see my reply inlined below.
>>>> 
>>>>> 在 2019年12月2日，下午3:47，Jingsong Lee  写道：
>>>>> 
>>>>> Hi Dian:
>>>>> 
>>>>> 
>>>>> Thanks for your driving. I have some questions:
>>>>> 
>>>>> 
>>>>> - Where should these configurations belong? You have mentioned
>>>> tableApi/SQL,
>>>>> so should in TableConfig?
>>>> 
>>>> All Python related configurations are defined in PythonOptions. User
>> could
>>>> configure these configurations via TableConfig.getConfiguration.setXXX
>> for
>>>> Python Table API programs.
>>>> 
>>>>> 
>>>>> - If just in table/sql, whether it should be called: table.python.,
>>>>> because in table, all config options are called table.***.
>>>> 
>>>> These configurations are not table specific. They will be used for both
>>>> Python Table API programs and Python DataStream API programs (which is
>>>> planned to be supported in the future). So python.xxx seems more
>>>> appropriate, what do you think?
>>>> 
>>>>> - What should table module do? So in CommonPythonCalc, we should read
>>>>> options from table config, and set resources to OneInputTransformation?
>>>> 
>>>> As described in the design doc, in compilation phase, for batch jobs,
>> the
>>>> required memory of the Python worker will be calculated according to the
>>>> configuration and set as the managed memory for the operator. For stream
>>>> jobs, the resource spec will be unknown(The reason is that currently the
>>>> resources for all the operators in stream jobs are unknown and it
>> doesn’t
>>>> support to configure both known and unknown resources in a single job).
>>>> 
>>>>> - Are all buffer.memory off-heap memory? I took a look
>>>>> to AbstractPythonScalarFunctionOperator, there is a
>> forwardedInputQueue,
>>>> is
>>>>> this one a heap queue? So we need heap memory too?
>>>> 
>>>> Yes, they are all off-heap memory which is supposed to be used by the
>>>> Python process. The forwardedInputQueue is a buffer used in the Java
>>>> operator and its memory

[RESULT] [VOTE] FLIP-88: PyFlink User-Defined Function Resource Management

2019-12-05 Thread Dian Fu

Hi everyone,

Thanks for the discussion and votes.

So far we have received 4 approving votes, 3 of which are binding and there is 
no -1 votes:
* Jincheng (binding)
* Hequn (binding)
* Jark (binding)
* Jingsong (non-binding)

Therefore, I'm happy to announce that FLIP-88 has been accepted.

Thanks everyone!

Regards,
Dian

Re: [VOTE] Release 1.8.3, release candidate #3

2019-12-10 Thread Dian Fu

+1 (non-binding)

- verified signatures and checksums
- build from source without tests with Scala 2.11 and Scala 2.12
- start standalone cluster and web ui is accessible
- submit word count example of both batch and stream, there is no suspicious 
log output
- run a couple of tests in IDE
- verified that all POM files point to 1.8.3
- the release note and website PR looks good

Regards,
Dian

> 在 2019年12月10日，下午10:58，Till Rohrmann  写道：
> 
> Hi everyone,
> 
> +1 (binding)
> 
> - verified that e2e tests pass on Travis
> - verified checksums and signatures
> - built Flink from sources with Scala 2.12
> - ran examples on standalone cluster
> 
> Cheers,
> Till
> 
> On Tue, Dec 10, 2019 at 12:23 PM Yang Wang  wrote:
> 
>> Hi Hequn,
>> 
>> +1 (non-binding)
>> 
>> - verified checksums and hashes
>> - built from sources (Scala 2.11 and Scala 2.12)
>> - running flink per-job/session cluster on Yarn with more that 1000
>> containers, good
>> 
>> Danny Chan  于2019年12月10日周二 上午9:39写道：
>> 
>>> Hi Hequn,
>>> 
>>> +1 (non-binding)
>>> 
>>> - verified checksums and hashes
>>> - built from local with MacOS 10.14 and JDK8
>>> - do some check in the SQL-CLI
>>> - run some tests in IDE
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年12月5日 +0800 PM9:39，Hequn Cheng ，写道：
 Hi everyone,
 
 Please review and vote on the release candidate #3 for the version
>> 1.8.3,
 as follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)
 
 
 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to
>>> be
 deployed to dist.apache.org [2], which are signed with the key with
 fingerprint EF88474C564C7A608A822EEC3FF96A2057B6476C [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag "release-1.8.3-rc3" [5],
 * website pull request listing the new release and adding announcement
>>> blog
 post [6].
 
 The vote will be open for at least 72 hours.
 Please cast your votes before *Dec. 10th 2019, 16:00 UTC*.
 
 It is adopted by majority approval, with at least 3 PMC affirmative
>>> votes.
 
 Thanks,
 Hequn
 
 [1]
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12346112
 [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.3-rc3/
 [3] https://dist.apache.org/repos/dist/release/flink/KEYS
 [4]
>>> https://repository.apache.org/content/repositories/orgapacheflink-1314/
 [5]
 
>>> 
>> https://github.com/apache/flink/commit/d54807ba10d0392a60663f030f9fe0bfa1c66754
 [6] https://github.com/apache/flink-web/pull/285
>>> 
>>

Re: [ANNOUNCE] Feature freeze for Apache Flink 1.10.0 release

2019-12-10 Thread Dian Fu

Hi Yu & Gary,

Thanks for your great work. Looking forward to the 1.10 release.

Regards,
Dian

> 在 2019年12月11日，上午10:29，Danny Chan  写道：
> 
> Hi Yu & Gary,
> 
> Thanks for your nice job ~
> 
> Best,
> Danny Chan
> 在 2019年11月19日 +0800 PM11:44，dev@flink.apache.org，写道：
>> 
>> Hi Yu & Gary,
>> 
>> Thanks a lot for your work and looking forward to the 1.10 release. :)

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-11 Thread Dian Fu

Thanks Hequn for being the release manager and everyone who contributed to this 
release.

Regards,
Dian

> 在 2019年12月12日，下午2:24，Hequn Cheng  写道：
> 
> Hi,
>  
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.8.3, which is the third bugfix release for the Apache Flink 1.8 
> series.
>  
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications.
>  
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
>  
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release:
> https://flink.apache.org/news/2019/12/11/release-1.8.3.html 
> 
>  
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12346112 
> 
>  
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> Great thanks to @Jincheng as a mentor during this release.
>  
> Regards,
> Hequn

Re: [VOTE] Apache Flink ML Release 2.3.0, release candidate #1

2023-06-29 Thread Dian Fu

+1 (binding)

- Verified the checksums and signatures.
- The website PR LGTM

Regards,
Dian

On Thu, Jun 29, 2023 at 7:03 PM Zhipeng Zhang  wrote:
>
> Thanks Dong and Xin for driving this release.
>
> +1 (non-binding)
>
> - Verified that the checksums and GPG files.
> - Verified that the source distributions do not contain any binaries.
> - Browsed through JIRA release notes files.
> - Browsed through README.md files.
>
> Xin Jiang  于2023年6月29日周四 12:08写道：
> >
> > Hi Dong,
> >
> > Thanks for driving this release.
> >
> > +1 (non-binding)
> >
> > - Verified that the checksums and GPG files.
> > - Verified that the source distributions do not contain any binaries.
> > - Built the source distribution and run all unit tests.
> > - Verified that all POM files point to the same version.
> > - Browsed through JIRA release notes files.
> > - Browsed through README.md files.
> >
> >
> > Best Regards,
> > Xin
>
>
>
> --
> best,
> Zhipeng

Re: [DISCUSS] Flink and Externalized connectors leads to block and circular dependency problems

2023-07-04 Thread Dian Fu

Thanks Ran Tao for proposing this discussion and Martijn for sharing
the thought.

>  While flink-python now fails the CI, it shouldn't actually depend on the
externalized connectors. I'm not sure what PyFlink does with it, but if
belongs to the connector code,

For each DataStream connector, there is a corresponding Python wrapper
and also some test cases in PyFlink. In theory, we should move that
wrapper into each connector repository. In the past, we have not done
that when externalizing the connectors since it may introduce some
burden when releasing since it means that we have to publish each
connector to PyPI separately.

To resolve this problem, I guess we can move the connector support in
PyFlink into the external connector repository.

Regards,
Dian


On Mon, Jul 3, 2023 at 11:08 PM Ran Tao  wrote:
>
> @Martijn
> thanks for clear explanations.
>
> If we follow the line you specified (Connectors shouldn't rely on
> dependencies that may or may not be
> available in Flink itself)
> It seems that we should add a certain dependency if we need(such as
> commons-io, commons-collection) in connector pom explicitly.
> And bundle it in sql-connector uber jar.
>
> Then there is only one thing left that we need to make flink-python test
> not depend on the released flink-connector.
> Maybe we should check it out and decouple it like you suggested.
>
> Best Regards,
> Ran Tao
> https://github.com/chucheng92
>
>
> Martijn Visser  于2023年7月3日周一 22:06写道：
>
> > Hi Ran Tao,
> >
> > Thanks for opening this topic. I think there's a couple of things at hand:
> > 1. Connectors shouldn't rely on dependencies that may or may not be
> > available in Flink itself, like we've seen with flink-shaded. That avoids a
> > tight coupling between Flink and connectors, which is exactly what we try
> > to avoid.
> > 2. When following that line, that would also be applicable for things like
> > commons-collections and commons-io. If a connector wants to use them, it
> > should make sure that it bundles those artifacts itself.
> > 3. While flink-python now fails the CI, it shouldn't actually depend on the
> > externalized connectors. I'm not sure what PyFlink does with it, but if
> > belongs to the connector code, that code should also be moved to the
> > individual connector repo. If it's just a generic test, we could consider
> > creating a generic test against released connector versions to determine
> > compatibility.
> >
> > I'm curious about the opinions of others as well.
> >
> > Best regards,
> >
> > Martijn
> >
> > On Mon, Jul 3, 2023 at 3:37 PM Ran Tao  wrote:
> >
> > > I have an issue here that needs to upgrade commons-collections[1] (this
> > is
> > > an example), but PR ci fails because flink-python test cases depend on
> > > flink-sql-connector-kafka, but kafka-sql-connector is a small jar, does
> > not
> > > include this dependency, so flink ci cause exception[2]. Current my
> > > solution is [3]. But even if this PR is done, the upgrade of flink still
> > > requires kafka-connector released.
> > >
> > > This issue leads to deeper problems. Although the connectors have been
> > > externalized, many UTs of flink-python depend on these connectors, and a
> > > basic agreement of externalized connectors is that other dependencies
> > > cannot be introduced explicitly, which means the externalized connectors
> > > use dependencies inherited from flink. In this way, when flink main
> > > upgrades some dependencies, it is easy to fail when executing
> > flink-python
> > > test cases，because flink no longer has this class, and the connector does
> > > not contain it. It's circular problem.
> > >
> > > Unless, the connector self-consistently includes all dependencies, which
> > is
> > > uncontrollable.
> > > (only a few connectors include all jars in shade phase)
> > >
> > > In short, the current flink-python module's dependencies on the connector
> > > leads to an incomplete process of externalization and decoupling, which
> > > will lead to circular dependencies when flink upgrade or change some
> > > dependencies.
> > >
> > > I don't know if I made it clear. I hope to get everyone's opinions on
> > what
> > > better solutions we should adopt for similar problems in the future.
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-30274
> > > [2]
> > >
> > >
> > https://user-images.githubusercontent.com/11287509/250120404-d12b60f4-7ff3-457e-a2c4-8cd415bb5ca2.png
> > >
> > >
> > >
> > https://user-images.githubusercontent.com/11287509/250120522-6b096a4f-83f0-4287-b7ad-d46b9371de4c.png
> > > [3] https://github.com/apache/flink-connector-kafka/pull/38
> > >
> > > Best Regards,
> > > Ran Tao
> > > https://github.com/chucheng92
> > >
> >

Re: [DISCUSS] Flink and Externalized connectors leads to block and circular dependency problems

2023-07-05 Thread Dian Fu

Hi Chesnay,

>> The wrapping of connectors is a bit of a maintenance nightmare and
doesn't really work with external/custom connectors.

Cannot agree with you more.

>> Has there ever been thoughts about changing flink-pythons connector
setup to use the table api connectors underneath?

I'm still not sure if this is feasible for all connectors, however,
this may be a good idea. The concern is that the DataStream API
connectors functionalities may be unaligned between Java and Python
connectors. Besides, there are still a few connectors which only have
DataStream API connectors, e.g. Google PubSub, RabbitMQ, Cassandra,
Pulsar, Hybrid Source, etc. Besides, it currently already supports
Table API connectors in PyFlink and if we take this way, maybe we
could just tell users to use Table API connector directly.

Another option in my head before is to provide an API which allows
configuring the behavior via key/value pairs in both the Java & Python
DataStream API connectors.

Regards,
Dian

On Wed, Jul 5, 2023 at 6:34 PM Chesnay Schepler  wrote:
>
> Has there ever been thoughts about changing flink-pythons connector
> setup to use the table api connectors underneath?
>
> The wrapping of connectors is a bit of a maintenance nightmare and
> doesn't really work with external/custom connectors.
>
> On 04/07/2023 13:35, Dian Fu wrote:
> > Thanks Ran Tao for proposing this discussion and Martijn for sharing
> > the thought.
> >
> >>   While flink-python now fails the CI, it shouldn't actually depend on the
> > externalized connectors. I'm not sure what PyFlink does with it, but if
> > belongs to the connector code,
> >
> > For each DataStream connector, there is a corresponding Python wrapper
> > and also some test cases in PyFlink. In theory, we should move that
> > wrapper into each connector repository. In the past, we have not done
> > that when externalizing the connectors since it may introduce some
> > burden when releasing since it means that we have to publish each
> > connector to PyPI separately.
> >
> > To resolve this problem, I guess we can move the connector support in
> > PyFlink into the external connector repository.
> >
> > Regards,
> > Dian
> >
> >
> > On Mon, Jul 3, 2023 at 11:08 PM Ran Tao  wrote:
> >> @Martijn
> >> thanks for clear explanations.
> >>
> >> If we follow the line you specified (Connectors shouldn't rely on
> >> dependencies that may or may not be
> >> available in Flink itself)
> >> It seems that we should add a certain dependency if we need(such as
> >> commons-io, commons-collection) in connector pom explicitly.
> >> And bundle it in sql-connector uber jar.
> >>
> >> Then there is only one thing left that we need to make flink-python test
> >> not depend on the released flink-connector.
> >> Maybe we should check it out and decouple it like you suggested.
> >>
> >> Best Regards,
> >> Ran Tao
> >> https://github.com/chucheng92
> >>
> >>
> >> Martijn Visser  于2023年7月3日周一 22:06写道：
> >>
> >>> Hi Ran Tao,
> >>>
> >>> Thanks for opening this topic. I think there's a couple of things at hand:
> >>> 1. Connectors shouldn't rely on dependencies that may or may not be
> >>> available in Flink itself, like we've seen with flink-shaded. That avoids 
> >>> a
> >>> tight coupling between Flink and connectors, which is exactly what we try
> >>> to avoid.
> >>> 2. When following that line, that would also be applicable for things like
> >>> commons-collections and commons-io. If a connector wants to use them, it
> >>> should make sure that it bundles those artifacts itself.
> >>> 3. While flink-python now fails the CI, it shouldn't actually depend on 
> >>> the
> >>> externalized connectors. I'm not sure what PyFlink does with it, but if
> >>> belongs to the connector code, that code should also be moved to the
> >>> individual connector repo. If it's just a generic test, we could consider
> >>> creating a generic test against released connector versions to determine
> >>> compatibility.
> >>>
> >>> I'm curious about the opinions of others as well.
> >>>
> >>> Best regards,
> >>>
> >>> Martijn
> >>>
> >>> On Mon, Jul 3, 2023 at 3:37 PM Ran Tao  wrote:
> >>>
> >>>> I have an issue here that needs to upgrade commons-collections[1] (this
&

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

2019-04-24 Thread Dian Fu

Thanks everyone for the discussion here. 

Regarding to the Java/Scala UDF and the built-in UDF to execute in the current 
Flink way (directly in JVM, not via RPC), I share the same thoughts with Max 
and Robert and I think it will not be a big problem. From the design doc, I 
guess the main reason to take the Py4J way instead of the DAG way at present is 
that DAG has some limitations in some scenarios such as interactive programing 
which may be a strong requirement for data scientist.

> In addition (and I'll admit this is rather subjective) it seems to me one of 
> the primary values of a table-like API in a given language (vs. just using 
> (say) plain old SQL itself via a console) is the ability to embed it in a 
> larger pipeline, or at least drop in operations that are not (as) naturally 
> expressed in the "table way," including existing libraries. In other words, a 
> full SDK. The Py4j wrapping doesn't extend itself to such integration nearly 
> as easily. 


Hi Robert, regarding to "a larger pipeline", do you mean translating a 
table-like API jobs from/to another kind of API job or embedding third-part 
libraries into a table-like API jobs via UDF? Could you kindly explain why this 
would be a problem for Py4J and will not be a problem if expressing the job 
with DAG?

Thanks,
Dian


> 在 2019年4月25日，上午12:16，Robert Bradshaw  写道：
> 
> Thanks for the meeting summary, Stephan. Sound like you covered a lot of 
> ground. Some more comments below, adding onto what Max has said. 
> 
> On Wed, Apr 24, 2019 at 3:20 PM Maximilian Michels  > wrote:
> >
> > Hi Stephan,
> >
> > This is excited! Thanks for sharing. The inter-process communication
> > code looks like the most natural choice as a common ground. To go
> > further, there are indeed some challenges to solve.
> 
> It certainly does make sense to share this work, though it does to me seem 
> like a rather low level to integrate at. 
> 
> > > => Biggest question is whether the language-independent DAG is expressive 
> > > enough to capture all the expressions that we want to map directly to 
> > > Table API expressions. Currently much is hidden in opaque UDFs. Kenn 
> > > mentioned the structure should be flexible enough to capture more 
> > > expressions transparently.
> >
> > Just to add some context how this could be done, there is the concept of
> > a FunctionSpec which is part of a transform in the DAG. FunctionSpec
> > contains a URN and with a payload. FunctionSpec can be either (1)
> > translated by the Runner directly, e.g. map to table API concepts or (2)
> > run a user-defined function with an Environment. It could be feasible
> > for Flink to choose the direct path, whereas Beam Runners would leverage
> > the more generic approach using UDFs. Granted, compatibility across
> > Flink and Beam would only work if both of the translation paths yielded
> > the same semantics.
> 
> To elaborate a bit on this, Beam DAGs are built up by applying Transforms 
> (basically operations) to PColections (the equivalent of dataset/datastream), 
> but the key point here is that these transforms are often composite 
> operations that expand out into smaller subtransforms. This expansion happens 
> during pipeline construction, but with the recent work on cross language 
> pipelines can happen out of process. This is one point of extendability. 
> Secondly, and importantly, this composite structure is preserved in the DAG, 
> and so a runner is free to ignore the provided expansion and supply its own 
> (so long as semantically it produces exactly the same output). These 
> composite operations can be identified by arbitrary URNs + payloads, and any 
> runner that does not understand them simply uses the pre-provided expansion. 
> 
> The existing Flink runner operates on exactly this principle, translating 
> URNs for the leaf operations (Map, Flatten, ...) as well as some composites 
> it can do better (e.g. Reshard). It is intentionally easy to define and add 
> new ones. This actually seems the easier approach (to me at least, but that's 
> probably heavily influenced by what I'm familiar with vs. what I'm not). 
> 
> As for how well this maps onto the Flink Tables API, part of that depends on 
> how much of the API is the operations themselves, and how much is concerning 
> configuration/environment/etc. which is harder to talk about in an agnostic 
> way. 
> 
> Using something like Py4j is an easy way to get up an running, especially for 
> a very faithful API, but the instant one wants to add UDFs one hits a cliff 
> of sorts (which is surmountable, but likely a lot harder than having gone the 
> above approach). In addition (and I'll admit this is rather subjective) it 
> seems to me one of the primary values of a table-like API in a given language 
> (vs. just using (say) plain old SQL itself via a console) is the ability to 
> embed it in a larger pipeline, or at least drop in operations that are not 
> (as) naturally express

Re: [ANNOUNCE] Apache Flink-shaded 7.0 released

2019-05-30 Thread Dian Fu

Cool. Thanks Jincheng and Chesnay for your great work.

Thanks,
Dian

> 在 2019年5月31日，下午1:53，jincheng sun  写道：
> 
> Hi all,
> 
> The Apache Flink community is very happy to announce the release of Apache
> Flink-shaded 7.0.
> 
> The flink-shaded project contains a number of shaded dependencies for
> Apache Flink.
> 
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12345226&styleName=Html&projectId=12315522
> 
> 
> We would like to thanks to @Chesnay Schepler 's help
> for officially publishing this release and thank all contributors of the
> Apache Flink community who made this release possible!
> 
> Regards,
> Jincheng

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu

Hi Vino,

Thanks a lot for starting this discussion. +1 to this feature as I think it 
will be very useful.

Regarding to using window to buffer the input elements, personally I don't 
think it's a good solution for the following reasons:
1) As we know that WindowOperator will store the accumulated results in states, 
this is not necessary for Local Aggregate operator.
2) For WindowOperator, each input element will be accumulated to states. This 
is also not necessary for Local Aggregate operator and storing the input 
elements in memory is enough.

Thanks,
Dian

> 在 2019年6月4日，上午10:03，vino yang  写道：
> 
> Hi Ken,
> 
> Thanks for your reply.
> 
> As I said before, we try to reuse Flink's state concept (fault tolerance
> and guarantee "Exactly-Once" semantics). So we did not consider cache.
> 
> In addition, if we use Flink's state, the OOM related issue is not a key
> problem we need to consider.
> 
> Best,
> Vino
> 
> Ken Krugler  于2019年6月4日周二 上午1:37写道：
> 
>> Hi all,
>> 
>> Cascading implemented this “map-side reduce” functionality with an LLR
>> cache.
>> 
>> That worked well, as then the skewed keys would always be in the cache.
>> 
>> The API let you decide the size of the cache, in terms of number of
>> entries.
>> 
>> Having a memory limit would have been better for many of our use cases,
>> though FWIR there’s no good way to estimate in-memory size for objects.
>> 
>> — Ken
>> 
>>> On Jun 3, 2019, at 2:03 AM, vino yang  wrote:
>>> 
>>> Hi Piotr,
>>> 
>>> The localKeyBy API returns an instance of KeyedStream (we just added an
>>> inner flag to identify the local mode) which is Flink has provided
>> before.
>>> Users can call all the APIs(especially *window* APIs) which KeyedStream
>>> provided.
>>> 
>>> So if users want to use local aggregation, they should call the window
>> API
>>> to build a local window that means users should (or say "can") specify
>> the
>>> window length and other information based on their needs.
>>> 
>>> I think you described another idea different from us. We did not try to
>>> react after triggering some predefined threshold. We tend to give users
>> the
>>> discretion to make decisions.
>>> 
>>> Our design idea tends to reuse Flink provided concept and functions like
>>> state and window (IMO, we do not need to worry about OOM and the issues
>> you
>>> mentioned).
>>> 
>>> Best,
>>> Vino
>>> 
>>> Piotr Nowojski  于2019年6月3日周一 下午4:30写道：
>>> 
 Hi,
 
 +1 for the idea from my side. I’ve even attempted to add similar feature
 quite some time ago, but didn’t get enough traction [1].
 
 I’ve read through your document and I couldn’t find it mentioning
 anywhere, when the pre aggregated result should be emitted down the
>> stream?
 I think that’s one of the most crucial decision, since wrong decision
>> here
 can lead to decrease of performance or to an explosion of memory/state
 consumption (both with bounded and unbounded data streams). For
>> streaming
 it can also lead to an increased latency.
 
 Since this is also a decision that’s impossible to make automatically
 perfectly reliably, first and foremost I would expect this to be
 configurable via the API. With maybe some predefined triggers, like on
 watermark (for windowed operations), on checkpoint barrier (to decrease
 state size?), on element count, maybe memory usage (much easier to
>> estimate
 with a known/predefined types, like in SQL)… and with some option to
 implement custom trigger.
 
 Also what would work the best would be to have a some form of memory
 consumption priority. For example if we are running out of memory for
 HashJoin/Final aggregation, instead of spilling to disk or crashing the
>> job
 with OOM it would be probably better to prune/dump the pre/local
 aggregation state. But that’s another story.
 
 [1] https://github.com/apache/flink/pull/4626 <
 https://github.com/apache/flink/pull/4626>
 
 Piotrek
 
> On 3 Jun 2019, at 10:16, sf lee  wrote:
> 
> Excited and  Big +1 for this feature.
> 
> SHI Xiaogang  于2019年6月3日周一 下午3:37写道：
> 
>> Nice feature.
>> Looking forward to having it in Flink.
>> 
>> Regards,
>> Xiaogang
>> 
>> vino yang  于2019年6月3日周一 下午3:31写道：
>> 
>>> Hi all,
>>> 
>>> As we mentioned in some conference, such as Flink Forward SF 2019 and
>> QCon
>>> Beijing 2019, our team has implemented "Local aggregation" in our
>> inner
>>> Flink fork. This feature can effectively alleviate data skew.
>>> 
>>> Currently, keyed streams are widely used to perform aggregating
>> operations
>>> (e.g., reduce, sum and window) on the elements that having the same
 key.
>>> When executed at runtime, the elements with the same key will be sent
 to
>>> and aggregated by the same task.
>>> 
>>> The performance of these aggregating operations is very sensitive to
 the
>>>

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu

Hi Vino,

It may seem similar to window operator but there are also a few key
differences. For example, the local aggregate operator can send out the
results at any time and the window operator can only send out the results
at the end of window (without early fire). This means that the local
aggregate operator can send out the results not only when the trigger time
is reached, but also when the memory is exhausted. This difference makes
optimization available as it means that the local aggregate operator rarely
need to operate the state.

I admit that window operator can solve part of the problem (the data skew)
and just wonder if we can do more. Using window operator at present seems
OK for me as it can indeed solve part of the problems. We just need to
think a little more in the design and make sure that the current solution
is consistent with future optimizations.

Thanks,

Dian

在 2019年6月4日，下午5:22，vino yang  写道：

Hi Dian,

Thanks for your reply.

I know what you mean. However, if you think deeply, you will find your
implementation need to provide an operator which looks like a window
operator. You need to use state and receive aggregation function and
specify the trigger time. It looks like a lightweight window operator.
Right?

We try to reuse Flink provided functions and reduce complexity. IMO, It is
more user-friendly because users are familiar with the window API.

Best,
Vino


Dian Fu  于2019年6月4日周二 下午4:19写道：

Hi Vino,

Thanks a lot for starting this discussion. +1 to this feature as I think
it will be very useful.

Regarding to using window to buffer the input elements, personally I don't
think it's a good solution for the following reasons:
1) As we know that WindowOperator will store the accumulated results in
states, this is not necessary for Local Aggregate operator.
2) For WindowOperator, each input element will be accumulated to states.
This is also not necessary for Local Aggregate operator and storing the
input elements in memory is enough.

Thanks,
Dian

在 2019年6月4日，上午10:03，vino yang  写道：

Hi Ken,

Thanks for your reply.

As I said before, we try to reuse Flink's state concept (fault tolerance
and guarantee "Exactly-Once" semantics). So we did not consider cache.

In addition, if we use Flink's state, the OOM related issue is not a key
problem we need to consider.

Best,
Vino

Ken Krugler  于2019年6月4日周二 上午1:37写道：

Hi all,

Cascading implemented this “map-side reduce” functionality with an LLR
cache.

That worked well, as then the skewed keys would always be in the cache.

The API let you decide the size of the cache, in terms of number of
entries.

Having a memory limit would have been better for many of our use cases,
though FWIR there’s no good way to estimate in-memory size for objects.

— Ken

On Jun 3, 2019, at 2:03 AM, vino yang  wrote:

Hi Piotr,

The localKeyBy API returns an instance of KeyedStream (we just added an
inner flag to identify the local mode) which is Flink has provided

before.

Users can call all the APIs(especially *window* APIs) which KeyedStream
provided.

So if users want to use local aggregation, they should call the window

API

to build a local window that means users should (or say "can") specify

the

window length and other information based on their needs.

I think you described another idea different from us. We did not try to
react after triggering some predefined threshold. We tend to give users

the

discretion to make decisions.

Our design idea tends to reuse Flink provided concept and functions

like

state and window (IMO, we do not need to worry about OOM and the issues

you

mentioned).

Best,
Vino

Piotr Nowojski  于2019年6月3日周一 下午4:30写道：

Hi,

+1 for the idea from my side. I’ve even attempted to add similar

feature

quite some time ago, but didn’t get enough traction [1].

I’ve read through your document and I couldn’t find it mentioning
anywhere, when the pre aggregated result should be emitted down the

stream?

I think that’s one of the most crucial decision, since wrong decision

here

can lead to decrease of performance or to an explosion of memory/state
consumption (both with bounded and unbounded data streams). For

streaming

it can also lead to an increased latency.

Since this is also a decision that’s impossible to make automatically
perfectly reliably, first and foremost I would expect this to be
configurable via the API. With maybe some predefined triggers, like on
watermark (for windowed operations), on checkpoint barrier (to

decrease

state size?), on element count, maybe memory usage (much easier to

estimate

with a known/predefined types, like in SQL)… and with some option to
implement custom trigger.

Also what would work the best would be to have a some form of memory
consumption priority. For example if we are running out of memory for
HashJoin/Final aggregation, instead of spilling to disk or crashing

the

job

with OOM it would be probably better to prune/dump the pre/local
agg

Re: [DISCUSS] Support Local Aggregation in Flink

2019-06-04 Thread Dian Fu

Hi Vino,

Thanks a lot for your reply.

> 1) When, Why and How to judge the memory is exhausted?

My point here is that the local aggregate operator can buffer the inputs in 
memory and send out the results AT ANY TIME. i.e. element count or the time 
interval reached a pre-configured value, the memory usage of buffered elements 
reached a configured valued (suppose we can estimate the object size 
efficiently), or even when checkpoint is triggered.

> 
> 2) If the local aggregate operator rarely needs to operate the state, what
> do you think about fault tolerance?

AbstractStreamOperator provides a method `prepareSnapshotPreBarrier` which can 
be used here to send out the results to the downstream when checkpoint is 
triggered. Then fault tolerance can work well. 

Even if there is no such a method available, we can still store the buffered 
elements or pre-aggregate results to state when checkpoint is triggered. The 
state access will be much less compared with window operator as only the 
elements not sent out when checkpoint occur have to be written to state. 
Suppose the checkpoint interval is 3 minutes and the trigger interval is 10 
seconds, then only about less than "10/180" elements will be written to state.


Thanks,
Dian


> 在 2019年6月5日，上午11:43，Biao Liu  写道：
> 
> Hi Vino,
> 
> +1 for this feature. It's useful for data skew. And it could also reduce
> shuffled datum.
> 
> I have some concerns about the API part. From my side, this feature should
> be more like an improvement. I'm afraid the proposal is an overkill about
> the API part. Many other systems support pre-aggregation as an optimization
> of global aggregation. The optimization might be used automatically or
> manually but with a simple API. The proposal introduces a series of
> flexible local aggregation APIs. They could be independent with global
> aggregation. It doesn't look like an improvement but introduces a lot of
> features. I'm not sure if there is a bigger picture later. As for now the
> API part looks a little heavy for me.
> 
> 
> vino yang  于2019年6月5日周三 上午10:38写道：
> 
>> Hi Litree,
>> 
>> From an implementation level, the localKeyBy API returns a general
>> KeyedStream, you can call all the APIs which KeyedStream provides, we did
>> not restrict its usage, although we can do this (for example returns a new
>> stream object named LocalKeyedStream).
>> 
>> However, to achieve the goal of local aggregation, it only makes sense to
>> call the window API.
>> 
>> Best,
>> Vino
>> 
>> litree  于2019年6月4日周二 下午10:41写道：
>> 
>>> Hi Vino，
>>> 
>>> 
>>> I have read your design，something I want to know is the usage of these
>> new
>>> APIs.It looks like when I use localByKey,i must then use a window
>> operator
>>> to return a datastream，and then use keyby and another window operator to
>>> get the final result?
>>> 
>>> 
>>> thanks,
>>> Litree
>>> 
>>> 
>>> On 06/04/2019 17:22, vino yang wrote:
>>> Hi Dian,
>>> 
>>> Thanks for your reply.
>>> 
>>> I know what you mean. However, if you think deeply, you will find your
>>> implementation need to provide an operator which looks like a window
>>> operator. You need to use state and receive aggregation function and
>>> specify the trigger time. It looks like a lightweight window operator.
>>> Right?
>>> 
>>> We try to reuse Flink provided functions and reduce complexity. IMO, It
>> is
>>> more user-friendly because users are familiar with the window API.
>>> 
>>> Best,
>>> Vino
>>> 
>>> 
>>> Dian Fu  于2019年6月4日周二 下午4:19写道：
>>> 
>>>> Hi Vino,
>>>> 
>>>> Thanks a lot for starting this discussion. +1 to this feature as I
>> think
>>>> it will be very useful.
>>>> 
>>>> Regarding to using window to buffer the input elements, personally I
>>> don't
>>>> think it's a good solution for the following reasons:
>>>> 1) As we know that WindowOperator will store the accumulated results in
>>>> states, this is not necessary for Local Aggregate operator.
>>>> 2) For WindowOperator, each input element will be accumulated to
>> states.
>>>> This is also not necessary for Local Aggregate operator and storing the
>>>> input elements in memory is enough.
>>>> 
>>>> Thanks,
>>>> Dian
>>>> 
>>>>> 在 2019年6月4日，上午10:03，vino yang  写道：
>>>>> 
>>>>> Hi Ke

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Dian Fu

+1 for this proposal.

Regards,
Dian

> 在 2019年6月12日，上午8:24，jincheng sun  写道：
> 
> big +1 for the proposal.
> 
> We will soon complete all the Python API functional development of the 1.9 
> release, the development of UDFs will be carried out. After the support of 
> UDFs is completed, it will be very natural to support Datastream API. 
> 
> If all of us agree with this proposal, I believe that for the 1.10 release, 
> it is possible to complete support both UDFs and DataStream API. And we will 
> do our best to make the 1.10 release that contains the Python DataStream API 
> support. 
> 
> So, great thanks to @Stephan for this proposal!
> 
> Best,
> Jincheng
> 
> Zili Chen mailto:wander4...@gmail.com>> 于2019年6月11日周二 
> 下午10:56写道：
> +1
> 
> Best,
> tison.
> 
> 
> zhijiang mailto:wangzhijiang...@aliyun.com>> 
> 于2019年6月11日周二 下午10:52写道：
> It is reasonable as stephan explained. +1 from my side! 
> --
> From:Jeff Zhang mailto:zjf...@gmail.com>>
> Send Time:2019年6月11日(星期二) 22:11
> To:Stephan Ewen mailto:se...@apache.org>>
> Cc:user mailto:u...@flink.apache.org>>; dev 
> mailto:dev@flink.apache.org>>
> Subject:Re: [DISCUSS] Deprecate previous Python APIs
>  
> +1
> 
> Stephan Ewen mailto:se...@apache.org>> 于2019年6月11日周二 
> 下午9:30写道：
> 
> > Hi all!
> >
> > I would suggest to deprecating the existing python APIs for DataSet and
> > DataStream API with the 1.9 release.
> >
> > Background is that there is a new Python API under development.
> > The new Python API is initially against the Table API. Flink 1.9 will
> > support Table API programs without UDFs, 1.10 is planned to support UDFs.
> > Future versions would support also the DataStream API.
> >
> > In the long term, Flink should have one Python API for DataStream and
> > Table APIs. We should not maintain multiple different implementations and
> > confuse users that way.
> > Given that the existing Python APIs are a bit limited and not under active
> > development, I would suggest to deprecate them in favor of the new API.
> >
> > Best,
> > Stephan
> >
> >
> 
> -- 
> Best Regards
> 
> Jeff Zhang
>

Re: About Deprecating split/select for DataStream API

2019-06-17 Thread Dian Fu

Hi all,

Thanks a lot for the discussion. I'm also in favor of rewriting/redesigning the 
split/select API instead of removing them. It has been a consensus that the 
side output API can achieve all the functionalities of the split/select API. 
The problem is whether we should also support some easy-to-use APIs on top of 
it. IMO, we should do that as long as the APIs have clear semantic and wide 
usage scenario. I think split/select API is such a kind of API. 

Regards,
Dian

> 在 2019年6月18日，上午12:30，xingc...@gmail.com 写道：
> 
> Hi all,
> 
> Thanks for sharing your thoughts on this topic.
> 
> First, we must admit that the current implementation for split/select is 
> flawed. I roughly went through the source codes, the problem may be that for 
> consecutive select/split(s), the former one will be overridden by the later 
> one during StreamGraph generation phase. That's why we forbid this 
> consecutive logic in FLINK-11084.
> 
> Now the question is whether we should guide users to migrate to the new side 
> output feature or thoroughly rework the broken API with the correct semantics 
> (instead of just trying to forbid all the "invalid" usages). 
> 
> Personally, I prefer the later solution because
> 
> 1. The split/select may have been widely used without touching the broken 
> part.
> 2. Though restricted compared with side output, the semantics for 
> split/select itself is acceptable since union does not support different data 
> types either.
> 3. We need a complete and easy-to-use transformation set for DataStream API. 
> Enabling side output for flatMap may not be an ultimate solution.
> 
> To summarize, maybe we should not easily deprecate the split/select public 
> API. If we come to a consensus on that, how about rewriting it based on side 
> output? (like the implementation for join on coGroup)
> 
> Any feedback is welcome : )
> 
> Best,
> Xingcan
> 
> -Original Message-
> From: SHI Xiaogang  
> Sent: Monday, June 17, 2019 8:08 AM
> To: Dawid Wysakowicz 
> Cc: dev@flink.apache.org
> Subject: Re: About Deprecating split/select for DataStream API
> 
> Hi Dawid,
> 
> Thanks a lot for your example.
> 
> I think most users will expect splitted1 to be empty in the example.
> 
> The unexpected results produced, in my opinion, is due to our problematic 
> implementation, instead of the confusing semantics.
> We can fix the problem if we add a SELECT operator to filter out unexpected 
> records (Of course, we can find some optimization to improve the efficiency.).
> 
> After all, i prefer to fix the problems to make the results as expected.
> What do you think?
> 
> Regards,
> Xiaogang
> 
> Dawid Wysakowicz  于2019年6月17日周一 下午7:21写道：
> 
>> Yes you are correct. The problem I described applies to the split not 
>> select as I wrote in the first email. Sorry for that.
>> 
>> I will try to prepare a correct example. Let's have a look at this example:
>> 
>>val splitted1 = ds.split(if (1) then "a")
>> 
>>val splitted2 = ds.split(if (!=1) then "a")
>> 
>> In those cases splitted1.select("a") -> will output all elements, the 
>> same for splitted2, because the OutputSelector(s) are applied to 
>> previous operator. The behavior I would assume is that splitted1 
>> outputs only "1"s, whereas splitted2 all but "1"s
>> 
>> On the other hand in a call
>> 
>>val splitted1 = ds.split(if ("1" or "2") then 
>> "a").select("a").split(if ("3") then "b").select("b")
>> 
>> I would assume an intersection of those two splits, so no results. 
>> What actually happens is that it will be "1", "2" & "3"s. Actually, 
>> right exceptions should be thrown in those cases not to produce 
>> confusing results, but this just shows that this API is broken, if we 
>> need to check for some prohibited configurations during runtime.
>> 
>> Those weird behaviors are in my opinion results of the flawed API, as 
>> it actually assigns an output selector to the previous operator. In 
>> other words it modifies previous operator. I think it would be much 
>> cleaner if this happened inside an operator rather than separately. 
>> This is what SideOutputs do, as you define them inside the 
>> ProcessFunction, rather than afterwards. Therefore I am very much in 
>> favor of using them for those cases. Once again if the problem is that 
>> they are available only in the ProcessFunction I would prefer enabling 
>> them e.g. in FlatMap, rather than keeping the split/select.
>> 
>> 
>> 
>> On 17/06/2019 09:40, SHI Xiaogang wrote:
>>> Hi Dawid,
>>> 
>>> As the select method is only allowed on SplitStreams, it's 
>>> impossible to construct the example ds.split().select("a", "b").select("c", 
>>> "d").
>>> 
>>> Are you meaning ds.split().select("a", "b").split().select("c", "d")?
>>> If so, then the tagging in the first split operation should not 
>>> affect
>> the
>>> second one. Then
>>>splitted.select("a", "b") => empty
>>>splitted.select("c", "d") => ds
>>> 
>>> I cannot quite catch your point here. It's appreciated if yo

Re: CatalogTestBase.* failed on travis

2019-06-20 Thread Dian Fu

Hi Haibo,

Thanks a lot for report this bug. I guess it's caused by this PR: 
https://github.com/apache/flink/pull/8786 
 @Bowen. I think we'd better merge 
the code ONLY after the travis passed, especially when the changes are not just 
hotfix/documentation. Anyway, I'll try to provide a fix ASAP.

Regards,
Dian

> 在 2019年6月20日，下午6:15，Haibo Sun  写道：
> 
> Hi, guys
> 
> 
> I noticed that all of the Travis tests reported a number of failures as 
> following. Is anyone working on this problem? 
> 
> 
> __ CatalogTestBase.test_table_exists 
> ___
> 
> self =  testMethod=test_table_exists>
> 
>def test_table_exists(self):
>>  self.catalog.create_database(self.db1, self.create_db(), False)
> 
> pyflink/table/tests/test_catalog.py:491: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> 
>@staticmethod
>def create_db():
>gateway = get_gateway()
>>  j_database = gateway.jvm.GenericCatalogDatabase({"k1": "v1"}, 
>> CatalogTestBase.test_comment)
> E   TypeError: 'JavaPackage' object is not callable
> 
> 
> pyflink/table/tests/test_catalog.py:78: TypeError
> 
> 
> 
> 
> Best,
> Haibo
> 
> 
>

Re: [ANNOUNCE] Jincheng Sun is now part of the Flink PMC

2019-06-24 Thread Dian Fu

Congratulations Jincheng!

On Mon, Jun 24, 2019 at 11:09 PM Robert Metzger  wrote:

> Hi all,
>
> On behalf of the Flink PMC, I'm happy to announce that Jincheng Sun is now
> part of the Apache Flink Project Management Committee (PMC).
>
> Jincheng has been a committer since July 2017. He has been very active on
> Flink's Table API / SQL component, as well as helping with releases.
>
> Congratulations & Welcome Jincheng!
>
> Best,
> Robert
>

Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread Dian Fu

Awesome! Thanks a lot for being the release manager. Great job! @Jincheng

Regards,
Dian

> 在 2019年7月3日，上午10:08，jincheng sun  写道：
> 
> I've also tweeted about it from my twitter: 
> https://twitter.com/sunjincheng121/status/1146236834344648704 
>  
> later would be tweeted it from @ApacheFlink!
> 
> Best, Jincheng
> 
> Hequn Cheng mailto:chenghe...@gmail.com>> 于2019年7月3日周三 
> 上午9:48写道：
> Thanks for being the release manager and the great work Jincheng!
> Also thanks to Gorden and the community making this release possible!
> 
> Best, Hequn
> 
> On Wed, Jul 3, 2019 at 9:40 AM jincheng sun  > wrote:
> Hi,
> 
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.8.1, which is the first bugfix release for the Apache Flink 1.8 
> series. 
> 
> Apache Flink® is an open-source stream processing framework for distributed, 
> high-performing, always-available, and accurate data streaming applications. 
> 
> The release is available for download at: 
> https://flink.apache.org/downloads.html  
> 
> 
> Please check out the release blog post for an overview of the 
> improvements for this bugfix release: 
> https://flink.apache.org/news/2019/07/02/release-1.8.1.html 
> 
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164
>  
> 
> 
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible! 
> 
> Great thanks to @Tzu-Li (Gordon) Tai  's offline 
> kind help!
> 
> Regards,
> Jincheng

Re: [DISCUSS] Publish the PyFlink into PyPI

2019-07-03 Thread Dian Fu

Hi Chesnay,

Thanks a lot for the suggestions.

Regarding “distributing java/scala code to PyPI”:
The Python Table API is just a wrapper of the Java Table API and without the 
java/scala code, two steps will be needed to set up an environment to execute a 
Python Table API program:
1) Install pyflink using "pip install apache-flink" 
2) Download the flink distribution and set the FLINK_HOME to it.
Besides, users have to make sure that the manually installed Flink is 
compatible with the pip installed pyflink. 

Bundle the java/scala code inside the Python package will eliminate step 2) and 
makes it more simple for users to install pyflink. There was a short discussion 
 on this in Spark community 
and they finally decide to package the java/scala code in the python package. 
(BTW, PySpark only bundle the jars of scala 2.11).

Regards,
Dian 

> 在 2019年7月3日，下午7:13，Chesnay Schepler  写道：
> 
> The existing artifact in the pyflink project was neither released by the 
> Flink project / anyone affiliated with it nor approved by the Flink PMC.
> 
> As such, if we were to use this account I believe we should delete it to not 
> mislead users that this is in any way an apache-provided distribution. Since 
> this goes against the users wishes, I would be in favor of creating a 
> separate account, and giving back control over the pyflink account.
> 
> My take on the raised points:
> 1.1) "apache-flink"
> 1.2)  option 2
> 2) Given that we only distribute python code there should be no reason to 
> differentiate between scala versions. We should not be distributing any 
> java/scala code and/or modules to PyPi. Currently, I'm a bit confused about 
> this question and wonder what exactly we are trying to publish here.
> 3) The should be treated as any other source release; i.e., it needs a 
> LICENSE and NOTICE file, signatures and a PMC vote. My suggestion would be to 
> make this part of our normal release process. There will be _one_ source 
> release on dist.apache.org encompassing everything, and a separate python of 
> focused source release that we push to PyPi. The LICENSE and NOTICE contained 
> in the python source release must also be present in the source release of 
> Flink; so basically the python source release is just the contents of 
> flink-python module the maven pom.xml, with no other special sauce added 
> during the release process.
> 
> On 02/07/2019 05:42, jincheng sun wrote:
>> Hi all,
>> 
>> With the effort of FLIP-38 [1], the Python Table API(without UDF support
>> for now) will be supported in the coming release-1.9.
>> As described in "Build PyFlink"[2], if users want to use the Python Table
>> API, they can manually install it using the command:
>> "cd flink-python && python3 setup.py sdist && pip install dist/*.tar.gz".
>> 
>> This is non-trivial for users and it will be better if we can follow the
>> Python way to publish PyFlink to PyPI
>> which is a repository of software for the Python programming language. Then
>> users can use the standard Python package
>> manager "pip" to install PyFlink: "pip install pyflink". So, there are some
>> topic need to be discussed as follows:
>> 
>> 1. How to publish PyFlink to PyPI
>> 
>> 1.1 Project Name
>>  We need to decide the project name of PyPI to use, for example,
>> apache-flink,  pyflink, etc.
>> 
>> Regarding to the name "pyflink", it has already been registered by
>> @ueqt and there is already a package '1.0' released under this project
>> which is based on flink-libraries/flink-python.
>> 
>>@ueqt has kindly agreed to give this project back to the community. And
>> he has requested that the released package '1.0' should not be removed as
>> it has already been used in their company.
>> 
>> So we need to decide whether to use the name 'pyflink'?  If yes, we
>> need to figure out how to tackle with the package '1.0' under this project.
>> 
>> From the points of my view, the "pyflink" is better for our project
>> name and we can keep the release of 1.0, maybe more people want to use.
>> 
>> 1.2 PyPI account for release
>> We need also decide on which account to use to publish packages to PyPI.
>> 
>> There are two permissions in PyPI: owner and maintainer:
>> 
>> 1) The owner can upload releases, delete files, releases or the entire
>> project.
>> 2) The maintainer can also upload releases. However, they cannot delete
>> files, releases, or the project.
>> 
>> So there are two options in my mind:
>> 
>> 1) Create an account such as 'pyflink' as the owner share it with all
>> the release managers and then release managers can publish the package to
>> PyPI using this account.
>> 2) Create an account such as 'pyflink' as owner(only PMC can manage it)
>> and adds the release manager's account as maintainers of the project.
>> Release managers publish the package to PyPI using their own account.
>> 
>> As I know, PySpark takes Option 1) and Apach

Re: [VOTE] Migrate to sponsored Travis account

2019-07-04 Thread Dian Fu

+1. Thanks Chesnay and Bowen for pushing this forward.

Regards,
Dian

> 在 2019年7月4日，下午6:28，zhijiang  写道：
> 
> +1 and thanks for Chesnay' work on this.
> 
> Best,
> Zhijiang
> 
> --
> From:Haibo Sun 
> Send Time:2019年7月4日(星期四) 18:21
> To:dev 
> Cc:priv...@flink.apache.org 
> Subject:Re:Re: [VOTE] Migrate to sponsored Travis account
> 
> +1. Thank Chesnay for pushing this forward.
> 
> Best,
> Haibo
> 
> 
> At 2019-07-04 17:58:28, "Kurt Young"  wrote:
>> +1 and great thanks Chesnay for pushing this.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Thu, Jul 4, 2019 at 5:44 PM Aljoscha Krettek  wrote:
>> 
>>> +1
>>> 
>>> Aljoscha
>>> 
 On 4. Jul 2019, at 11:09, Stephan Ewen  wrote:
 
 +1 to move to a private Travis account.
 
 I can confirm that Ververica will sponsor a Travis CI plan that is
 equivalent or a bit higher than the previous ASF quota (10 concurrent
>>> build
 queues)
 
 Best,
 Stephan
 
 On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler 
>>> wrote:
 
> I've raised a JIRA
> with INFRA to
>>> inquire
> whether it would be possible to switch to a different Travis account,
> and if so what steps would need to be taken.
> We need a proper confirmation from INFRA since we are not in full
> control of the flink repository (for example, we cannot access the
> settings page).
> 
> If this is indeed possible, Ververica is willing sponsor a Travis
> account for the Flink project.
> This would provide us with more than enough resources than we need.
> 
> Since this makes the project more reliant on resources provided by
> external companies I would like to vote on this.
> 
> Please vote on this proposal, as follows:
> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
> provided that INFRA approves
> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
> account
> 
> The vote will be open for at least 24h, and until we have confirmation
> from INFRA. The voting period may be shorter than the usual 3 days since
> our current is effectively not working.
> 
> On 04/07/2019 06:51, Bowen Li wrote:
>> Re: > Are they using their own Travis CI pool, or did the switch to an
>> entirely different CI service?
>> 
>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>> currently moving away from ASF's Travis to their own in-house metal
>> machines at [1] with custom CI application at [2]. They've seen
>> significant improvement w.r.t both much higher performance and
>> basically no resource waiting time, "night-and-day" difference quoting
>> Wes.
>> 
>> Re: > If we can just switch to our own Travis pool, just for our
>> project, then this might be something we can do fairly quickly?
>> 
>> I believe so, according to [3] and [4]
>> 
>> 
>> [1] https://ci.ursalabs.org/ 
>> [2] https://github.com/ursa-labs/ursabot
>> [3]
>> 
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>> [4]
>>> https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>> 
>> 
>> 
>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler > > wrote:
>> 
>>   Are they using their own Travis CI pool, or did the switch to an
>>   entirely different CI service?
>> 
>>   If we can just switch to our own Travis pool, just for our
>>   project, then
>>   this might be something we can do fairly quickly?
>> 
>>   On 03/07/2019 05:55, Bowen Li wrote:
>>> I responded in the INFRA ticket [1] that I believe they are
>>   using a wrong
>>> metric against Flink and the total build time is a completely
>>   different
>>> thing than guaranteed build capacity.
>>> 
>>> My response:
>>> 
>>> "As mentioned above, since I started to pay attention to Flink's
>>   build
>>> queue a few tens of days ago, I'm in Seattle and I saw no build
>>   was kicking
>>> off in PST daytime in weekdays for Flink. Our teammates in China
>>   and Europe
>>> have also reported similar observations. So we need to evaluate
>>   how the
>>> large total build time came from - if 1) your number and 2) our
>>> observations from three locations that cover pretty much a full
>>   day, are
>>> all true, I **guess** one reason can be that - highly likely the
>>   extra
>>> build time came from weekends when other Apache projects may be
>>   idle and
>>> Flink just drains hard its congested queue.
>>> 
>>> Please be aware of that we're not complaining about the lack of
>>   resources
>>> in general, I'm complaining about the lack of **stable, dedicated**
>>> r

Re: [DISCUSS] ARM support for Flink

2019-07-07 Thread Dian Fu

Hi Xiyuan,

Thanks for bring the discussion. 

WRT the exception, it's because the native bundled in the rocksdb jar file 
isn't compiled with cross platform support. You can refer [1] for how to build 
rocksdb which has ARM platform.

WRT ARM support, the rocksdb currently used in Flink is hosted in the Ververica 
git [2], so it won't be difficult to make it support ARM. However, I guess this 
git exists just for temporary [3], not because we want to add much feature in 
rocksdb.

[1] https://github.com/facebook/rocksdb/issues/678 

[2] https://github.com/dataArtisans/frocksdb 

[3] https://issues.apache.org/jira/browse/FLINK-10471 


Regards,
Dian

> 在 2019年7月8日，上午9:17，Xiyuan Wang  写道：
> 
> Hi Flink:
>  Recently we meet a problem that we have to test and run Flink on ARM
> arch. While after searching Flink community, I didn’t find an official ARM
> release version.
> 
> Since Flink is made by Java and Scala language which can be ran
> cross-platform usually, I think Flink can be built and ran on ARM directly
> as well. Then with my local test, Flink was built and deployed success as
> expected. But some tests were failed due to ARM arch. For example:
> 
> 1. MemoryArchitectureTest.testArchitectureNotUnknown:34 Values should be
> different. Actual: UNKNOWN
> 2. [ERROR]
> testIterator(org.apache.flink.contrib.streaming.state.RocksDBRocksStateKeysIteratorTest)
> Time elapsed: 0.234 s  <<< ERROR!
> java.io.IOException: Could not load the native RocksDB library
> at
> org.apache.flink.contrib.streaming.state.RocksDBRocksStateKeysIteratorTest.testIteratorHelper(RocksDBRocksStateKeysIteratorTest.java:90)
> at
> org.apache.flink.contrib.streaming.state.RocksDBRocksStateKeysIteratorTest.testIterator(RocksDBRocksStateKeysIteratorTest.java:63)
> Caused by: java.lang.UnsatisfiedLinkError:
> /tmp/rocksdb-lib-81ca7930b92af2cca143a050c0338d34/librocksdbjni-linux64.so:
> /tmp/rocksdb-lib-81ca7930b92af2cca143a050c0338d34/librocksdbjni-linux64.so:
> cannot open shared object file: No such file or directory (Possible cause:
> can't load AMD 64-bit .so on a AARCH64-bit platform)
> …
> 
>  Since the test isn’t passed totally, we are not sure if Flink 100%
> support ARM or not. Is it possible for Flink to support ARM release
> officially? I guess it may be not a very huge work basing on Java. I notice
> that Flink now uses trivis-ci which is X86 only for build & test check. Is
> it possible to add an ARM arch CI as well? It can be non-voting first. Then
> we can keep monitoring and fixing ARM related error. One day it’s stable
> enough, we can remove the non-voting tag and create Flink ARM release.
> 
>  There is an open source CI community called OpenLab[1] which can provide
> CI function and ARM resource to Flink by free. I’m one of the OpenLab
> members. If Flink commun think ARM support is fine, I can keep helping
> Flink to build and maintain the ARM CI job. There is an  POC for Flink ARM
> build job made by me on OpenLab system[2] and a live demo which built and
> run on an ARM VM[3]. You can take a look first.
> 
> Eager to get everybody’s feedback. Any question is welcome.
> 
> Thanks.
> 
> [1]: https://openlabtesting.org/
> [2]: https://github.com/theopenlab/flink/pull/1
> [3]: http://114.115.168.52:8081/#/overview

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Dian Fu


Congrats Rong!


> 在 2019年7月12日，上午8:47，Chen YuZhao  写道：
> 
> congratulations!
> 
> 获取 Outlook for iOS 
>  
> 发件人: Rong Rong 
> 发送时间: 星期五, 七月 12, 2019 8:09 上午
> 收件人: Hao Sun
> 抄送: Xuefu Z; dev; Flink ML
> 主题: Re: [ANNOUNCE] Rong Rong becomes a Flink committer
>  
> Thank you all for the warm welcome!
> 
> It's my honor to become an Apache Flink committer. 
> I will continue to work on this great project and contribute more to the 
> community.
> 
> Cheers,
> Rong
> 
> On Thu, Jul 11, 2019 at 1:05 PM Hao Sun  > wrote:
> Congratulations Rong. 
> 
> On Thu, Jul 11, 2019, 11:39 Xuefu Z  > wrote:
> Congratulations, Rong!
> 
> On Thu, Jul 11, 2019 at 10:59 AM Bowen Li  > wrote:
> Congrats, Rong!
> 
> 
> On Thu, Jul 11, 2019 at 10:48 AM Oytun Tez  > wrote:
> 
> > Congratulations Rong!
> >
> > ---
> > Oytun Tez
> >
> > *M O T A W O R D*
> > The World's Fastest Human Translation Platform.
> > oy...@motaword.com  ― www.motaword.com 
> > 
> >
> >
> > On Thu, Jul 11, 2019 at 1:44 PM Peter Huang  > >
> > wrote:
> >
> >> Congrats Rong!
> >>
> >> On Thu, Jul 11, 2019 at 10:40 AM Becket Qin  >> > wrote:
> >>
> >>> Congrats, Rong!
> >>>
> >>> On Fri, Jul 12, 2019 at 1:13 AM Xingcan Cui  >>> > wrote:
> >>>
>  Congrats Rong!
> 
>  Best,
>  Xingcan
> 
>  On Jul 11, 2019, at 1:08 PM, Shuyi Chen   > wrote:
> 
>  Congratulations, Rong!
> 
>  On Thu, Jul 11, 2019 at 8:26 AM Yu Li   > wrote:
> 
> > Congratulations Rong!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Thu, 11 Jul 2019 at 22:54, zhijiang  > >
> > wrote:
> >
> >> Congratulations Rong!
> >>
> >> Best,
> >> Zhijiang
> >>
> >> --
> >> From:Kurt Young mailto:ykt...@gmail.com>>
> >> Send Time:2019年7月11日(星期四) 22:54
> >> To:Kostas Kloudas mailto:kklou...@gmail.com>>
> >> Cc:Jark Wu mailto:imj...@gmail.com>>; Fabian Hueske 
> >> mailto:fhue...@gmail.com>>;
> >> dev mailto:dev@flink.apache.org>>; user 
> >> mailto:u...@flink.apache.org>>
> >> Subject:Re: [ANNOUNCE] Rong Rong becomes a Flink committer
> >>
> >> Congratulations Rong!
> >>
> >> Best,
> >> Kurt
> >>
> >>
> >> On Thu, Jul 11, 2019 at 10:53 PM Kostas Kloudas  >> >
> >> wrote:
> >> Congratulations Rong!
> >>
> >> On Thu, Jul 11, 2019 at 4:40 PM Jark Wu  >> > wrote:
> >> Congratulations Rong Rong!
> >> Welcome on board!
> >>
> >> On Thu, 11 Jul 2019 at 22:25, Fabian Hueske  >> >
> >> wrote:
> >> Hi everyone,
> >>
> >> I'm very happy to announce that Rong Rong accepted the offer of the
> >> Flink PMC to become a committer of the Flink project.
> >>
> >> Rong has been contributing to Flink for many years, mainly working on
> >> SQL and Yarn security features. He's also frequently helping out on the
> >> user@f.a.o mailing lists.
> >>
> >> Congratulations Rong!
> >>
> >> Best, Fabian
> >> (on behalf of the Flink PMC)
> >>
> >>
> >>
> 
> 
> 
> -- 
> Xuefu Zhang
> 
> "In Honey We Trust!"

Re: flink-python failed on Travis

2019-07-16 Thread Dian Fu

Thanks for reporting this issue. I will take a look at it.

> 在 2019年7月17日，上午11:50，Danny Chan  写道：
> 
> I have the same issue ~~
> 
> Best,
> Danny Chan
> 在 2019年7月17日 +0800 AM11:21，Haibo Sun ，写道：
>> Hi, folks
>> 
>> 
>> I noticed that all of the Travis tests reported the following failure. Is 
>> anyone working on this issue?
>> 
>> 
>> ___ summary 
>> 
>> ERROR: py27: InvocationError for command 
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/bin/python3.7 -m 
>> virtualenv --no-download --python 
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/envs/2.7/bin/python2.7
>>  py27 (exited with code 1)
>> py33: commands succeeded
>> ERROR: py34: InvocationError for command 
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/bin/python3.7 -m 
>> virtualenv --no-download --python 
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/envs/3.4/bin/python3.4
>>  py34 (exited with code 100)
>> py35: commands succeeded
>> py36: commands succeeded
>> py37: commands succeeded
>> tox checks... [FAILED]
>> PYTHON exited with EXIT CODE: 1.
>> Trying to KILL watchdog (12990).
>> 
>> 
>> Best,
>> Haibo

Re: flink-python failed on Travis

2019-07-17 Thread Dian Fu

Hi all,

This issue has been fixed in 
https://github.com/apache/flink/commit/200a5bf9dca9d398cf07879d4d1e407a2f41d839 
<https://github.com/apache/flink/commit/200a5bf9dca9d398cf07879d4d1e407a2f41d839>.
  Thanks for @WeiZhong's fix.

Regards,
Dian

> 在 2019年7月18日，上午2:41，Bowen Li  写道：
> 
> Hi Dian,
> 
> Is there any update on this? It seems have been failing for a day.
> 
> 
> 
> On Tue, Jul 16, 2019 at 9:35 PM Dian Fu  wrote:
> 
>> Thanks for reporting this issue. I will take a look at it.
>> 
>>> 在 2019年7月17日，上午11:50，Danny Chan  写道：
>>> 
>>> I have the same issue ~~
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年7月17日 +0800 AM11:21，Haibo Sun ，写道：
>>>> Hi, folks
>>>> 
>>>> 
>>>> I noticed that all of the Travis tests reported the following failure.
>> Is anyone working on this issue?
>>>> 
>>>> 
>>>> ___ summary
>> 
>>>> ERROR: py27: InvocationError for command
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/bin/python3.7 -m
>> virtualenv --no-download --python
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/envs/2.7/bin/python2.7
>> py27 (exited with code 1)
>>>> py33: commands succeeded
>>>> ERROR: py34: InvocationError for command
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/bin/python3.7 -m
>> virtualenv --no-download --python
>> /home/travis/build/flink-ci/flink/flink-python/dev/.conda/envs/3.4/bin/python3.4
>> py34 (exited with code 100)
>>>> py35: commands succeeded
>>>> py36: commands succeeded
>>>> py37: commands succeeded
>>>> tox checks... [FAILED]
>>>> PYTHON exited with EXIT CODE: 1.
>>>> Trying to KILL watchdog (12990).
>>>> 
>>>> 
>>>> Best,
>>>> Haibo
>> 
>>

Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project

2019-07-18 Thread Dian Fu

Congrats Becket!

> 在 2019年7月18日，下午6:42，Danny Chan  写道：
> 
>> Congratulations!
> 
> Best,
> Danny Chan
> 在 2019年7月18日 +0800 PM6:29，Haibo Sun ，写道：
>> Congratulations Becket!Best,
>> Haibo
>> 在 2019-07-18 17:51:06，"Hequn Cheng"  写道：
>>> Congratulations Becket!
>>> 
>>> Best, Hequn
>>> 
>>> On Thu, Jul 18, 2019 at 5:34 PM vino yang  wrote:
>>> 
 Congratulations!
 
 Best,
 Vino
 
 Yun Gao  于2019年7月18日周四 下午5:31写道：
 
> Congratulations!
> 
> Best,
> Yun
> 
> 
> --
> From:Kostas Kloudas 
> Send Time:2019 Jul. 18 (Thu.) 17:30
> To:dev 
> Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a
 committer
> to the Flink project
> 
> Congratulations Becket!
> 
> Kostas
> 
> On Thu, Jul 18, 2019 at 11:21 AM Guowei Ma  wrote:
> 
>> Congrats Becket!
>> 
>> Best,
>> Guowei
>> 
>> 
>> Terry Wang  于2019年7月18日周四 下午5:17写道：
>> 
>>> Congratulations Becket!
>>> 
 在 2019年7月18日，下午5:09，Dawid Wysakowicz  写道：
 
 Congratulations Becket! Good to have you onboard!
 
 On 18/07/2019 10:56, Till Rohrmann wrote:
> Congrats Becket!
> 
> On Thu, Jul 18, 2019 at 10:52 AM Jeff Zhang 
> wrote:
> 
>> Congratulations Becket!
>> 
>> Xu Forward  于2019年7月18日周四 下午4:39写道：
>> 
>>> Congratulations Becket! Well deserved.
>>> 
>>> 
>>> Cheers,
>>> 
>>> forward
>>> 
>>> Kurt Young  于2019年7月18日周四 下午4:20写道：
>>> 
 Congrats Becket!
 
 Best,
 Kurt
 
 
 On Thu, Jul 18, 2019 at 4:12 PM JingsongLee <
>> lzljs3620...@aliyun.com
 .invalid>
 wrote:
 
> Congratulations Becket!
> 
> Best, Jingsong Lee
> 
> 
> 
> --
> From:Congxian Qiu 
> Send Time:2019年7月18日(星期四) 16:09
> To:dev@flink.apache.org 
> Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added
 as a
 committer
> to the Flink project
> 
> Congratulations Becket! Well deserved.
> 
> Best,
> Congxian
> 
> 
> Jark Wu  于2019年7月18日周四 下午4:03写道：
> 
>> Congratulations Becket! Well deserved.
>> 
>> Cheers,
>> Jark
>> 
>> On Thu, 18 Jul 2019 at 15:56, Paul Lam <
 paullin3...@gmail.com>
>>> wrote:
>>> Congrats Becket!
>>> 
>>> Best,
>>> Paul Lam
>>> 
 在 2019年7月18日，15:41，Robert Metzger 
 写道：
 
 Hi all,
 
 I'm excited to announce that Jiangjie (Becket) Qin just
> became
>> a
> Flink
 committer!
 
 Congratulations Becket!
 
 Best,
 Robert (on behalf of the Flink PMC)
>>> 
>> 
>> --
>> Best Regards
>> 
>> Jeff Zhang
>> 
 
>>> 
>>> 
>> 
>

1 2 3 4 5 6 7 8 >

1 - 100 of 786 matches

Mail list logo