Re: Re: Would you please give me the permission as a contributor?

2019-02-14 Thread Fengchao Wang
 Thanks Fabian !

2019-02-15 

Fengchao Wang 



发件人:Fabian Hueske 
发送时间:2019-02-15 15:32
主题:Re: Would you please give me the permission as a contributor?
收件人:"dev"
抄送:

Hi, 

Welcome to the Flink community! 
I gave you contributor permissions for Jira. 

Best, Fabian 

Am Fr., 15. Feb. 2019 um 05:51 Uhr schrieb Fengchao Wang < 
fengchao9...@163.com>: 

> Hi Guys, 
> 
> 
> I want to contribute to Apache Flink. 
> Would you please give me the permission as a contributor? 
> My JIRA ID is Fengchao Wang 
> 
> 
> 2019-02-15 
> 
> 
> Fengchao Wang 

Re: want to be a Flink contributor

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 11:19 Uhr schrieb 金 钟镛 :

> Hey Guys,
>
> I want to be a contributor to flink, could you give me the premission?
> My JIRA ID is AT-Fieldless
>


Re: apply to become the contributor -- dev@flink.apache.org

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 15:48 Uhr schrieb 戴雄军 :

> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is cellen.
>
> thanks !
>
>
>
>
>
> *姓名*
> * qqq...@163.com  *
> *公司名称:税友集团北京分公司*
> 地址:北京市海淀区中关村南大街甲12号寰太大厦
> 电话:
> 手机:15652685890
> [image: 二维码]
>
> 扫描该二维码,可以将电子名片迅速保存到手机 使用帮助
> 
>
>
> At 2019-02-14 22:40:37, dev-h...@flink.apache.org wrote:
> >Hi! This is the ezmlm program. I'm managing the
> >dev@flink.apache.org mailing list.
> >
> >I'm working for my owner, who can be reached
> >at dev-ow...@flink.apache.org.
> >
> >Acknowledgment: I have added the address
> >
> >   qqq...@163.com
> >
> >to the dev mailing list.
> >
> >Welcome to dev@flink.apache.org!
> >
> >Please save this message so that you know the address you are
> >subscribed under, in case you later want to unsubscribe or change your
> >subscription address.
> >
> >
> >--- Administrative commands for the dev list ---
> >
> >I can handle administrative requests automatically. Please
> >do not send them to the list address! Instead, send
> >your message to the correct command address:
> >
> >To subscribe to the list, send a message to:
> >   
> >
> >To remove your address from the list, send a message to:
> >   
> >
> >Send mail to the following for info and FAQ for this list:
> >   
> >   
> >
> >Similar addresses exist for the digest list:
> >   
> >   
> >
> >To get messages 123 through 145 (a maximum of 100 per request), mail:
> >   
> >
> >To get an index with subject and author for messages 123-456 , mail:
> >   
> >
> >They are always returned as sets of 100, max 2000 per request,
> >so you'll actually get 100-499.
> >
> >To receive all messages with the same subject as message 12345,
> >send a short message to:
> >   
> >
> >The messages should contain one line or word of text to avoid being
> >treated as sp@m, but I will ignore their content.
> >Only the ADDRESS you send to is important.
> >
> >You can start a subscription for an alternate address,
> >for example "john@host.domain", just add a hyphen and your
> >address (with '=' instead of '@') after the command word:
> >
> >
> >To stop subscription for this address, mail:
> >
> >
> >In both cases, I'll send a confirmation message to that address. When
> >you receive it, simply reply to it to complete your subscription.
> >
> >If despite following these instructions, you do not get the
> >desired results, please contact my owner at
> >dev-ow...@flink.apache.org. Please be patient, my owner is a
> >lot slower than I am ;-)
> >
> >--- Enclosed is a copy of the request I received.
> >
> >Return-Path: 
> >Received: (qmail 84957 invoked by uid 99); 14 Feb 2019 14:40:37 -
> >Received: from pnap-us-west-generic-nat.apache.org (HELO 
> >spamd4-us-west.apache.org) (209.188.14.142)
> >by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Feb 2019 14:40:37 +
> >Received: from localhost (localhost [127.0.0.1])
> > by spamd4-us-west.apache.org (ASF Mail Server at 
> > spamd4-us-west.apache.org) with ESMTP id 67826C055E
> > for 
> > ; 
> > Thu, 14 Feb 2019 14:40:37 + (UTC)
> >X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
> >X-Spam-Flag: NO
> >X-Spam-Score: 2.531
> >X-Spam-Level: **
> >X-Spam-Status: No, score=2.531 tagged_above=-999 required=6.31
> > tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
> > DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FROM_EXCESS_BASE64=0.105,
> > HK_RANDOM_ENVFROM=0.626, HTML_MESSAGE=2, KAM_SHORT=0.001,
> > SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled
> >Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
> > dkim=pass (1024-bit key) header.d=163.com
> >Received: from mx1-lw-us.apache.org ([10.40.0.8])
> > by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, 
> > port 10024)
> > with ESMTP id epBkvvFggLMi
> > for 
> > ;
> > Thu, 14 Feb 2019 14:40:32 + (UTC)
> >Received: from m13-101.163.com (m13-101.163.com [220.181.13.101])
> > by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with 
> > ESMTP id 6877C5F404
> > for 
> > ; 
> > Thu, 14 Feb 2019 14:40:24 + (UTC)
> >DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
> > s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=+EQNp
> > CyZmdEwNFy5JAaYIEUZGendY1iiIgjDAA7UiG4=; b=PzOwQez8OzwKTrao8Z5+3
> > 5x1WJFiBcSh0KZIMEC0VQCIyPvENQNlqfUmdCdFSdQA58FJLA94X2EyrblKoITLJ
> > 2gqYYhYnBrnPowH6GHyHJg1q7gb8o/jzh2IVKoH8k22+1SPRJyEYVr5mP73cJ7Pl
> > bDNmVDWRSdhcNkqzCmO/HA=
> >Received: from qqqdxj$163.com ( [114.249.219.227] ) by ajax-webmail-wmsvr101
> > (Coremail) ; Thu, 14 Feb 2019 22:40:10 +0800 (CST)
> >X-Originating-IP: [114.249.219.227]
> >Date: 

Re: Would you please give me the permission as a contributor?

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Fr., 15. Feb. 2019 um 05:51 Uhr schrieb Fengchao Wang <
fengchao9...@163.com>:

> Hi Guys,
>
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is Fengchao Wang
>
>
> 2019-02-15
>
>
> Fengchao Wang


Re: Would you please give me the permission as a contributor?

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Fr., 15. Feb. 2019 um 05:13 Uhr schrieb 蒋晓峰 :

> Hi Guys,
>
>
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is nicholasjiang
>
>
>


Re: Apply for JIRA Contributor roles

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I gave you contributor permissions for Jira.

Best, Fabian

Am Fr., 15. Feb. 2019 um 05:08 Uhr schrieb Fengchao Wang <
fengchao9...@163.com>:

> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is wangfengchao.
>
> 2019-02-15
>
>
> Fengchao Wang


Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-02-14 Thread Fabian Hueske
Hi Jark,

Thanks so much for evaluating Docusaurus and Crowdin and writing up this
report!
It seems that staying with the current approach and avoiding the effort of
porting the documentation is the best solution for now.

We might want to reconsider Docusaurus if we want to restructure the
documentation completely at some point.

Thanks,
Fabian

Am Do., 14. Feb. 2019 um 16:41 Uhr schrieb Jark Wu :

> Hi all,
>
> Qianjing Xu and I have been exploring Docusaurus and crowdin in recent days
> to see how can Flink fit into them to support multi-language.
>
> Here is the summary from our side:
> (we didn't learn deeply, so we may miss something or misunderstand. Please
> correct us if we are wrong)
>
> # What is Docusaurus and crowdin ?
> Docusaurus [1] is a documentation framework which supports document
> versioning and localization (via crowdin). IMO, Docusaurus is something
> similar to Jekyll.
> Crowdin[2] is a localization management platform. Users can upload contents
> (e.g. markdown source files) to crowdin and translate, collaborate, manage
> process on crowdin.
> The English contents is kept in the original repository, and the multiple
> language translated contents is kept in crowdin. We need to download the
> translated contents from crowdin and build them into localization website.
> Apache Pulsar is using them for website and documentation as @Sijie
> mentioned above.
> Here is the Pulsar project on crowdin [3].
> And here is a test project for Flink I created [4].
>
> # How can Flink fit into them?
> I'm afraid that Flink is hard to fit into Docusaurus unless we rework our
> website/docs from Jekyll to Docusaurus.
>
> How about Jekyll + Crowdin?
> We need a build job to make it work. The build job is triggered when a
> commit merged into master.
> The build job does the following things:
> 1) upload the lastest contents (English markdown source files) from git to
> crowdin.
> - If the source content is changed, the corresponding translation will
> lose and need re-translation.
> 2) download the translated contents
> 3) build website and publish
>
> But it seems that Crowdin doesn't fit well with Jekyll.
> Crowdin will break contents into multiple lines to translate according his
> syntax. This results to the layout broken.
> For example, the translated metric page is not rendered as expect:
> this is the original `metric.md`:
>
> https://user-images.githubusercontent.com/5378924/52795366-9a22f080-30ac-11e9-9cfd-4de82041aa77.png
> this is the file downloaded from crowdin:
>
> https://user-images.githubusercontent.com/5378924/52795379-9f803b00-30ac-11e9-9700-0a4077b5882d.png
> this is the page after rendered:
>
> https://user-images.githubusercontent.com/5378924/52795389-a5761c00-30ac-11e9-9821-35c705a8d65b.png
>
> So it seems that currently crowdin works less well when the markdown
> contains HTML and Liquid codes.
>
> # Conclusion (Docusaurus/crowdin or previous proposal)
> I'm leaning more towards the previous proposal [5]. Not only because
> crowdin works less well for Flink currently.
> But also it introduces much new things and tools to learn for contributors.
> Furthermore the previous proposed translate process works good when we
> experiment it in Flink website.
>
> What do you think?
>
> Regards,
> Jark
>
> [1]: https://docusaurus.io/en/
> [2]: https://crowdin.com/
> [3]: https://crowdin.com/project/apache-pulsar/zh-CN#
> [4]: https://crowdin.com/project/flink-test/zh-CN#
> [5]:
>
> https://docs.google.com/document/d/1R1-uDq-KawLB8afQYrczfcoQHjjIhq6tvUksxrfhBl0/edit#
>
>
> On Tue, 12 Feb 2019 at 23:48, Xingcan Cui  wrote:
>
> > Hi,
> >
> > I agree with the proposal, Gordon and Jark, and I think it's a good
> > solution for major doc changes. We also created separated JIRAs for
> English
> > documentation in the past.
> >
> > For minor doc changes, I think it’s better to encourage Chinese-speaking
> > contributors to participate in the reviewing process and help translate
> the
> > few lines in sync.
> >
> > Best,
> > Xingcan
> >
> > > On Feb 12, 2019, at 7:04 AM, Jark Wu  wrote:
> > >
> > > Hi @Sijie,
> > >
> > > Thank you for the valuable information. I will explore Docusaurus and
> > > feedback here.
> > >
> > > Best,
> > > Jark
> > >
> > > On Tue, 12 Feb 2019 at 18:18, Fabian Hueske  wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> +1 to what Gordon and Jark proposed.
> > >> We should make use of the review bot to ensure that new features are
> > >> documented at least in English.
> > >> If the Chinese docs are not updated, a Jira issue must be created.
> > >>
> > >> @Sijie, thank for the pointer to Docusaurus! IMO, this looks very
> > >> interesting and should be worth exploring.
> > >>
> > >> Thanks, Fabian
> > >>
> > >>
> > >>
> > >> Am Di., 12. Feb. 2019 um 09:06 Uhr schrieb Sijie Guo <
> si...@apache.org
> > >:
> > >>
> > >>> Hi,
> > >>>
> > >>> Sorry for interrupting the thread. But this topic sounds interesting
> to
> > >> me.
> > >>> It might be worth for you guys checking 

Re: [DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-14 Thread Haibo Sun


Which classes need to be changed or use new classes?


I'm working on a design that Flink runtime support TwoInputSelectable,
and I'll give a initial proposal document next Monday. In the proposal,
the core classes that need to be changed include UnionInputGate and
StremTwoInputProcessor. I think the discussion that follows can be based
on this initial proposal.


Do we need other refactoring?


From the codes we have implemented this functionality, no other refactors
need to be done before it.


Best,
Haibo
At 2019-02-14 18:54:57, "Stephan Ewen"  wrote:
>To move this forward, would suggest the following:
>
>  - Let's quickly check which other classes need to change. I assume the
>TwoInputStreamTask and StreamTwoInputProcessor ?
>  - Can those changes be new classes that are used when the new operator is
>used? The current TwoInputStreamTask and StreamTwoInputProcessor remain
>until they are fully subsumed and are then removed.
>
>  - Do we need and other refactorings before, like some cleanup of the
>Operator Config or the Operator Chain?
>
>Best,
>Stephan
>
>
>On Sun, Feb 10, 2019 at 7:25 AM Guowei Ma  wrote:
>
>> 2019.2.10
>>
>>
>> Hi,Stephan
>>
>>
>> Thank you very much for such detailed and constructive comments.
>>
>>
>> *binary vs. n-ary* and *enum vs. integer*
>>
>>
>> Considering the N-ary, as you mentioned, using integers may be a better
>> choice.
>>
>>
>> *generic selectable interface*
>>
>>
>> You are right. This interface can be removed.
>>
>>
>> *end-input*
>>
>> It is true that the Operator does not need to store the end-input state,
>> which can be inferred by the system and notify the Operator at the right
>> time. We can consider using this mechanism when the system can checkpoint
>> the topology with the Finish Tasks.
>>
>>
>> *early-out*
>>
>> It is reasonable for me not to consider this situation at present.
>>
>>
>> *distributed stream deadlocks*
>>
>>
>> At present, there is no deadlock for the streaming, but I think it might
>> be  still necessary to do some validation(Warning or Reject) in JobGraph.
>> Because once Flink introduces this TwoInputSelectable interface, the user
>> of the streaming would also construct a diamond-style topology that may be
>> deadlocked.
>>
>>
>> *empty input / selection timeout*
>>
>> It is reasonable for me not to consider this situation at present.
>>
>>
>> *timers*
>>
>> When all the inputs are finished, TimeService will wait until all timers
>> are triggered. So there should be no problem. I and others guys are
>> confirming the details to see if there are other considerations
>>
>>
>> Best
>>
>> GuoWei
>>
>> Stephan Ewen  于2019年2月8日周五 下午7:56写道:
>>
>> > Nice design proposal, and +1 to the general idea.
>> >
>> > A few thoughts / suggestions:
>> >
>> > *binary vs. n-ary*
>> >
>> > I would plan ahead for N-ary operators. Not because we necessarily need
>> > n-ary inputs (one can probably build that purely in the API) but because
>> of
>> > future side inputs. The proposal should be able to handle that as well.
>> >
>> > *enum vs. integer*
>> >
>> > The above might be easier is to realize when going directly with integer
>> > and having ANY, FIRST, SECOND, etc. as pre-defined constants.
>> > Performance wise, it is probably not difference whether to use int or
>> enum.
>> >
>> > *generic selectable interface*
>> >
>> > From the proposal, I don't understand quite what that interface is for.
>> My
>> > understanding is that the input processor or task that calls the
>> > operators's functions would anyways work on the TwoInputStreamOperator
>> > interface, for efficiency.
>> >
>> > *end-input*
>> >
>> > I think we should not make storing the end-input the operator's
>> > responsibility
>> > There is a simple way to handle this, which is also consistent with other
>> > aspects of handling finished tasks:
>> >
>> >   - If a task is finished, that should be stored in the checkpoint.
>> >  - Upon restoring a finished task, if it has still running successors, we
>> > deploy a "finished input channel", which immediately send the "end of
>> > input" when task is started.
>> >  - the operator will hence set the end of input immediately again upon
>> >
>> > *early-out*
>> >
>> > Letting nextSelection() return “NONE” or “FINISHED" may be relevant for
>> > early-out cases, but I would remove this from the scope of this proposal.
>> > There are most likely other big changes involved, like communicating this
>> > to the upstream operators.
>> >
>> > *distributed stream deadlocks*
>> >
>> > We had this issue in the DataSet API. Earlier versions of the DataSet API
>> > made an analysis of the flow detecting dams and whether the pipeline
>> > breaking behavior in the flow would cause deadlocks, and introduce
>> > artificial pipeline breakers in response.
>> >
>> > The logic was really complicated and it took a while to become stable. We
>> > had several issues that certain user functions (like mapPartition) could
>> > either be pipelined or have a full dam (not 

Would you please give me the permission as a contributor?

2019-02-14 Thread Fengchao Wang
Hi Guys,


I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is Fengchao Wang


2019-02-15


Fengchao Wang 

[jira] [Created] (FLINK-11619) Make ScheduleMode configurable via user code or configuration file

2019-02-14 Thread yuqi (JIRA)
yuqi created FLINK-11619:


 Summary: Make ScheduleMode configurable via user code or 
configuration file 
 Key: FLINK-11619
 URL: https://issues.apache.org/jira/browse/FLINK-11619
 Project: Flink
  Issue Type: Improvement
Reporter: yuqi
Assignee: yuqi


Currently, Schedule mode for stream job is always 
see StreamingJobGraphGenerator#createJobGraph

{code:java}
// make sure that all vertices start immediately
jobGraph.setScheduleMode(ScheduleMode.EAGER);
{code}

on this point, we can make ScheduleMode configurable to user so as to adapt 
different environment. Users can set this option via env.setScheduleMode() in 
code, or make it optional in configuration. 

Anyone's help and suggestions is welcomed.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re:Re: Would you please give me the permission as a contributor?

2019-02-14 Thread 赵慧
Cool, Thanks Gordon!




At 2019-02-15 11:44:08, "Tzu-Li (Gordon) Tai"  wrote:
>Hi,
>
>Welcome to the community, Hui! I gave you contributor permissions in JIRA.
>
>Cheers,
>Gordon
>
>On Fri, Feb 15, 2019 at 10:39 AM 赵慧  wrote:
>
>> Hi Guys,
>>
>> I want to contribute to Apache Flink.
>> Would you please give me the permission as a contributor?
>> My JIRA ID is Cynthia Zhao.


Would you please give me the permission as a contributor?

2019-02-14 Thread 蒋晓峰
Hi Guys,



I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is nicholasjiang




Apply for JIRA Contributor roles

2019-02-14 Thread Fengchao Wang
Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is wangfengchao.

2019-02-15


Fengchao Wang 

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread jincheng sun
Hi Stephan,

Thanks for the clarification! You are right, we have never initiated a
discussion about supporting OVER Window on DataStream, we can discuss it in
a separate thread. I agree with you add the item after move the discussion
forward.

+1 for putting the roadmap on the website.
+1 for periodically update the roadmap, as mentioned by Fabian, we can
update it at every feature version release.

Thanks,
Jincheng

Stephan Ewen  于2019年2月14日周四 下午5:44写道:

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>>>
 Hi all!

 Recently several contributors, committers, and users asked about making
 it more visible in which way the project is currently going.

 Users and developers can track the direction by following the
 discussion threads and JIRA, but due to the mass of discussions and open
 issues, it is very hard to get a good overall picture.
 Especially for new users and contributors, is is very hard to get a
 quick overview of the project direction.

 To fix this, I suggest to add a brief roadmap summary to the homepage.
 It is a bit of a commitment to keep that roadmap up to date, but I think
 the benefit for users justifies that.
 The Apache Beam project has added such a roadmap [1]
 , which was received very well by
 the community, I would suggest to follow a similar structure here.

 If the community is in favor of this, I would volunteer to write a
 first version of such a roadmap. The points I would include are below.

 Best,
 Stephan

 [1] https://beam.apache.org/roadmap/

 

 Disclaimer: Apache Flink is not governed or steered by any one single
 entity, but by its community and Project Management Committee (PMC). This
 is not a authoritative roadmap in the sense of a plan with a specific
 timeline. Instead, we share our vision for the future and major initiatives
 that are receiving attention and give users and contributors an
 understanding what they can look forward to.

 *Future Role of Table API and DataStream API*
   - Table API becomes first class citizen
   - Table API becomes primary API for analytics use cases
   * Declarative, automatic optimizations
   * No manual control over state and timers
   - DataStream API becomes primary API for applications and data
 pipeline use cases
   * Physical, user controls data types, no magic or optimizer
   * Explicit control over state and time

 *Batch Streaming Unification*
   - Table API unification (environments) (FLIP-32)
   - New unified source interface 

Re: [VOTE] Release Apache Flink 1.7.2, release candidate #1

2019-02-14 Thread jincheng sun
Thanks Gordon for preparing the RC1 for Flink 1.7.2 !

+1 (non-binding)

I checked a few things as follows:

   - check download dist source success
   - check repositories folder for 1.7.2, and jars can be download success
   - review the release PR, left one comment which not block the release
(please update the
website PR)
   - compiled from source success(download from dist)
   - started a cluster, and submit WordCount example by WebUI, run success.

Cheers,
Jincheng

Chesnay Schepler  于2019年2月15日周五 上午12:52写道:

> +1
>
> - no pom changes were made that require licensing changes
> - verified signatures
> - compiled from source
> - started a cluster; WebUI is accessible, can submit jobs, no suspicious
> log output
> - scanned release notes and made some modifications, *please update the
> website PR*
>
> On 11.02.2019 19:41, Tzu-Li (Gordon) Tai wrote:
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version 1.7.2,
> > as follows:
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release and binary convenience releases to
> be
> > deployed to dist.apache.org [2], which are signed with the key with
> > fingerprint 1C1E2394D3194E1944613488F320986D35C33D6A [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag “release-1.7.2-rc1” [5],
> > * website pull request listing the new release and adding announcement
> blog
> > post [6].
> >
> > The vote will be open for at least 72 hours.
> > Please cast your votes before *Feb. 15th 2019, 12:00 PM CET*.
> >
> > It is adopted by majority approval, with at least 3 PMC affirmative
> votes.
> >
> > Thanks,
> > Gordon
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344632
> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.7.2-rc1/
> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1206
> > [5]
> >
> https://gitbox.apache.org/repos/asf?p=flink.git;a=commit;h=ceba8af39b28b91ae6f2d5cbdb1b99258a73e742
> > [6] https://github.com/apache/flink-web/pull/157
> >
>
>


Hello!

2019-02-14 Thread 炯炯
Hi, my name is James Guo. I am a Apache Flink lovers and wannabecome a 
Flink contributor. 

My JIRA ID is James Guo. Hope to be allowed !

Hello!

2019-02-14 Thread 炯炯
Hi, my name is James Guo. I am a Apache Flink lovers and wannaa become a 
Flinkcontributor. 

MyJIRAIDis James Guo. Hope to be allowed !

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread zhijiang
Thanks Stephan for this proposal and I totally agree with it. 

It is very necessary to summarize the overall features/directions the community 
is going or planning to go. Although I almost checked the mailing list 
everyday, it still seems difficult to trace everything. In addtion I think this 
whole roadmap picture can also help expose the relationships among different 
items, even avoid the similar/duplicated thoughts or works.

Just one small suggestion, if we coule add some existing link 
(jira/discussion/FLIP/google doc) for each listed item, then it would be easy 
to keep trace of the interested one and handle the progress of it.

Best,
Zhijiang
--
From:Jeff Zhang 
Send Time:2019年2月14日(星期四) 18:03
To:Stephan Ewen 
Cc:dev ; user ; jincheng sun 
; Shuyi Chen ; Rong Rong 

Subject:Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen  于2019年2月14日周四 下午5:44写道:

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>>>
 Hi all!

 Recently several contributors, committers, and users asked about making
 it more visible in which way the project is currently going.

 Users and developers can track the direction by following the
 discussion threads and JIRA, but due to the mass of discussions and open
 issues, it is very hard to get a good overall picture.
 Especially for new users and contributors, is is very hard to get a
 quick overview of the project direction.

 To fix this, I suggest to add a brief roadmap summary to the homepage.
 It is a bit of a commitment to keep that roadmap up to date, but I think
 the benefit for users justifies that.
 The Apache Beam project has added such a roadmap [1]
 , which was received very well by
 the community, I would suggest to follow a similar structure here.

 If the community is in favor of this, I would volunteer to write a
 first version of such a roadmap. The points I would include are below.

 Best,
 Stephan

 [1] https://beam.apache.org/roadmap/

 

 Disclaimer: Apache Flink is not governed or steered by any one single
 entity, but by 

Would you please give me the permission as a contributor?

2019-02-14 Thread 赵慧
Hi Guys,

I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is Cynthia Zhao.

[ANNOUNCEMENT] March 2019 Bay Area Apache Flink Meetup

2019-02-14 Thread Xuefu Zhang
Hi all,

I'm very excited to announce that the community is planning the next meetup
in Bay Area on March 25, 2019. The event is just announced on Meetup.com
[1].

To make the event successful, your participation and help will be needed.
Currently, we are looking for an organization that can host the event.
Please let me know if you have any leads.

Secondly, we encourage Flink users and developers to take this as an
opportunity to share experience or development. Thus, please let me know if
you like to give a short talk.

I look forward to meeting you all in the Meetup.

Regards,
Xuefu

[1] https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/258975465


Re: [VOTE] Release Apache Flink 1.7.2, release candidate #1

2019-02-14 Thread Chesnay Schepler

+1

- no pom changes were made that require licensing changes
- verified signatures
- compiled from source
- started a cluster; WebUI is accessible, can submit jobs, no suspicious 
log output
- scanned release notes and made some modifications, *please update the 
website PR*


On 11.02.2019 19:41, Tzu-Li (Gordon) Tai wrote:

Hi everyone,

Please review and vote on the release candidate #1 for the version 1.7.2,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint 1C1E2394D3194E1944613488F320986D35C33D6A [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag “release-1.7.2-rc1” [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours.
Please cast your votes before *Feb. 15th 2019, 12:00 PM CET*.

It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Gordon

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344632
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.7.2-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1206
[5]
https://gitbox.apache.org/repos/asf?p=flink.git;a=commit;h=ceba8af39b28b91ae6f2d5cbdb1b99258a73e742
[6] https://github.com/apache/flink-web/pull/157





Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Rong Rong
Hi Stephan,

Thanks for the clarification, yes I think these issues has already been
discussed in previous mailing list threads [1,2,3].

I also agree that updating the "official" roadmap every release is a very
good idea to avoid frequent update.
One question I might've been a bit confusion is: are we suggesting to keep
one roadmap on the documentation site (e.g. [4]) per release, or simply
just one most up-to-date roadmap in the main website [5] ?
Just like the release notes in every release, the former will probably
provide a good tracker for users to look back at previous roadmaps as well
I am assuming.

Thanks,
Rong

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html

[4] https://ci.apache.org/projects/flink/flink-docs-release-1.7/
[5] https://flink.apache.org/

On Thu, Feb 14, 2019 at 2:26 AM Stephan Ewen  wrote:

> I think the website is better as well.
>
> I agree with Fabian that the wiki is not so visible, and visibility is the
> main motivation.
> This type of roadmap overview would not be updated by everyone - letting
> committers update the roadmap means the listed threads are actually
> happening at the moment.
>
>
> On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske  wrote:
>
>> Hi,
>>
>> I like the idea of putting the roadmap on the website because it is much
>> more visible (and IMO more credible, obligatory) there.
>> However, I share the concerns about frequent updates.
>>
>> It think it would be great to update the "official" roadmap on the
>> website once per release (-bugfix releases), i.e., every three month.
>> We can use the wiki to collect and draft the roadmap for the next update.
>>
>> Best, Fabian
>>
>>
>> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang :
>>
>>> Hi Stephan,
>>>
>>> Thanks for this proposal. It is a good idea to track the roadmap. One
>>> suggestion is that it might be better to put it into wiki page first.
>>> Because it is easier to update the roadmap on wiki compared to on flink web
>>> site. And I guess we may need to update the roadmap very often at the
>>> beginning as there's so many discussions and proposals in community
>>> recently. We can move it into flink web site later when we feel it could be
>>> nailed down.
>>>
>>> Stephan Ewen  于2019年2月14日周四 下午5:44写道:
>>>
 Thanks Jincheng and Rong Rong!

 I am not deciding a roadmap and making a call on what features should
 be developed or not. I was only collecting broader issues that are already
 happening or have an active FLIP/design discussion plus committer support.

 Do we have that for the suggested issues as well? If yes , we can add
 them (can you point me to the issue/mail-thread), if not, let's try and
 move the discussion forward and add them to the roadmap overview then.

 Best,
 Stephan


 On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:

> Thanks Stephan for the great proposal.
>
> This would not only be beneficial for new users but also for
> contributors to keep track on all upcoming features.
>
> I think that better window operator support can also be separately
> group into its own category, as they affects both future DataStream API 
> and
> batch stream unification.
> can we also include:
> - OVER aggregate for DataStream API separately as @jincheng suggested.
> - Improving sliding window operator [1]
>
> One more additional suggestion, can we also include a more extendable
> security module [2,3] @shuyi and I are currently working on?
> This will significantly improve the usability for Flink in corporate
> environments where proprietary or 3rd-party security integration is 
> needed.
>
> Thanks,
> Rong
>
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
>
>
>
> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
> wrote:
>
>> Very excited and thank you for launching such a great discussion,
>> Stephan !
>>
>> Here only a little suggestion that in the Batch Streaming Unification
>> section, do we need to add an item:
>>
>> - Same window operators on bounded/unbounded Table API and DataStream
>> API
>> (currently OVER window only exists in SQL/TableAPI, DataStream API
>> does not yet support)
>>
>> 

Re: [DISCUSS] Flink Kerberos Improvement

2019-02-14 Thread Rong Rong
Hi Stephan,

This proposal is an extension of @shuyi's initial improvement specifically
to tackle Kerberos related issues.
However in order for this extension to work, some of the original
components proposed are required (such as the service provider pattern for
security factories).

Thanks,
Rong

On Thu, Feb 14, 2019 at 1:35 AM Stephan Ewen  wrote:

> Hi all!
>
> A quick question: Is this a special case of the security improvements
> proposed in this thread [1], or a separate proposal all together?
>
> Stephan
>
> [1]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>
> On Tue, Dec 18, 2018 at 8:06 PM Rong Rong  wrote:
>
> > Hi Shuyi,
> >
> > Yes. I think the impersonation is a very much valid question! This can
> > actually be considered as 2 questions as I stated in the doc.
> > 1. In the doc I stated that impersonation should be implemented on the
> > user-side code and should only invoke the cluster client as the actual
> user
> > joe'.
> > 2. However, since currently the cluster client assumes no impersonation
> at
> > all, many of the code assumes that a fully authorized client can be
> > instantiated with the same authority that the actual Flink cluster has.
> > When impersonation is enabled, this might not be the case. For example,
> if
> > impersonation is in place, most likely the cluster client running on
> joe's
> > behalf will not, and should not have access to keytab file of 'joe'.
> > Instead, a delegation token is used. Thus the second part of the doc is
> > trying to address this issue.
> >
> > --
> > Rong
> >
> > On Mon, Dec 17, 2018 at 11:41 PM Shuyi Chen  wrote:
> >
> > > Hi Rong, thanks a lot for the proposal. Currently, Flink assume the
> > keytab
> > > is located in a remote DFS. Pre-installing Keytabs statically in YARN
> > node
> > > local filesystem is a common approach, so I think we should support
> this
> > > mode in Flink natively. As an optimazation to reduce the KDC access
> > > frequency, we should also support method 3 (the DT approach) as
> discussed
> > > in [1]. A question is that why do we need to implement impersonation in
> > > Flink? I assume the superuser can do the impersonation for 'joe' and
> > 'joe'
> > > can then invoke Flink client to deploy the job. Thanks a lot.
> > >
> > > Shuyi
> > >
> > > [1]
> > >
> > >
> >
> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit
> > >
> > > On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:
> > >
> > > > Hi All,
> > > >
> > > > We have been experimenting integration of Kerberos with Flink in our
> > Corp
> > > > environment and found out some limitations on the current
> > Flink-Kerberos
> > > > security mechanism running with Apache YARN.
> > > >
> > > > Based on the Hadoop Kerberos security guide [1]. Apparently there are
> > > only
> > > > a subset of the suggested long-running service security mechanism is
> > > > supported in Flink. Furthermore, the current model does not work well
> > > with
> > > > superuser impersonating actual users [2] for deployment purposes,
> which
> > > is
> > > > a widely adopted way to launch application in corp environments.
> > > >
> > > > We would like to propose an improvement [3] to introduce the other
> > > comment
> > > > methods [1] for securing long-running application on YARN and enable
> > > > impersonation mode. Any comments and suggestions are highly
> > appreciated.
> > > >
> > > > Many thanks,
> > > > Rong
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> > > > [2]
> > > >
> > > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> > > > [3]
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
> > > >
> > >
> > >
> > > --
> > > "So you have to trust that the dots will somehow connect in your
> future."
> > >
> >
>


Re: List of consumed kafka topics should not be restored from state

2019-02-14 Thread Aljoscha Krettek
I think these two Jira issues are relevant here:
 - https://issues.apache.org/jira/browse/FLINK-10342 

 - https://issues.apache.org/jira/browse/FLINK-9303 


The second one only because it’s slightly related. The first one is actually 
exactly this thread.

I was against changing this behaviour in the Jira but I can now see that this 
is quite likely an issue.

Aljoscha

> On 13. Feb 2019, at 18:55, Gyula Fóra  wrote:
> 
> Hi!
> 
> I agree that it’s very confusing if you explicitly specify the topics that
> are to be confusing and what happens is different.
> 
> I would almost consider this to be a bug , can’t see any reasonable use
> case just hard to debug problems .
> 
> Having an option would be a good start but I would rather treat this as a
> bug.
> 
> Gyula
> 
> On Wed, 13 Feb 2019 at 18:27, Feng LI  wrote:
> 
>> Hello there,
>> 
>> I’m just wondering if there are real world use cases for maintaining this
>> default behavior. It’s a bit counter intuitive and sometimes results in
>> serious production issues. ( We had a similar issue when changing the topic
>> name, and resulting reading every message twice - both from the old one and
>> from the new).
>> 
>> Cheers,
>> Feng
>> Le mer. 13 févr. 2019 à 17:56, Tzu-Li (Gordon) Tai  a
>> écrit :
>> 
>>> Hi,
>>> 
>>> Partition offsets stored in state will always be respected when the
>>> consumer is restored from checkpoints / savepoints.
>>> AFAIK, this seems to have been the behaviour for quite some time now
>> (since
>>> FlinkKafkaConsumer08).
>>> 
>>> I think in the past there were some discussion to at least allow some way
>>> to ignore restored partition offsets.
>>> One way to enable this is to filter the restored partition offsets based
>> on
>>> the configured list of specified topics / topic regex pattern in the
>>> current execution. This should work, since this can only be modified when
>>> restoring from savepoints (i.e. manual restores).
>>> To avoid breaking the current behaviour, we can maybe add a
>>> `filterRestoredPartitionOffsetState()` configuration on the consumer,
>> which
>>> by default is disabled to match the current behaviour.
>>> 
>>> What do you think?
>>> 
>>> Cheers,
>>> Gordon
>>> 
>>> On Wed, Feb 13, 2019 at 11:59 PM Gyula Fóra 
>> wrote:
>>> 
 Hi!
 
 I have run into a weird issue which I could have sworn that it wouldnt
 happen :D
 I feel there was a discussion about this in the past but maybe im
>> wrong,
 but I hope someone can point me to a ticket.
 
 Lets say you create a kafka consumer that consumes (t1,t2,t3), you
>> take a
 savepoint and deploy a new version that only consumes (t1).
 
 The restore logic now still starts to consume (t1,t2,t3) which feels
>> very
 unintuitive as those were explicitly removed from the list. It is also
>>> hard
 to debug as the topics causing the problem are not defined anywhere in
>>> your
 job, configs etc.
 
 Has anyone run into this issue? Should we change this default behaviour
>>> or
 at least have an option to not do this?
 
 Cheers,
 Gyula
 
>>> 
>> 



[jira] [Created] (FLINK-11618) [state] Refactor operator state repartition mechanism

2019-02-14 Thread Yun Tang (JIRA)
Yun Tang created FLINK-11618:


 Summary: [state] Refactor operator state repartition mechanism
 Key: FLINK-11618
 URL: https://issues.apache.org/jira/browse/FLINK-11618
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.7.0
Reporter: Yun Tang
Assignee: Yun Tang
 Fix For: 1.8.0


Currently we have state assignment strategy of operator state below:
 * When parallelism not changed:
 ** If we only have even-split redistributed state, state assignment would try 
to keep as the same as previously (actually not always the same).
 ** If we have union redistributed state, all the operator state would be 
redistributed as the new state assignment.
 * When parallelism changed:
 ** all the operator state would be redistributed as the new state assignment.

There existed two problems *when parallelism not changed*:
 # If we only have even-split redistributed state, current implementation 
actually cannot ensure state assignment to keep as the same as previously. This 
is because current {{StateAssignmentOperation#collectPartitionableStates}} 
would repartition {{managedOperatorStates}} without subtask-index information. 
Take and example, if we have a operator-state with parallelism as 2, and 
subtask-0's managed-operatorstate is empty while subtask-1 not. Although new 
parallelism still keeps as 2, after 
{{StateAssignmentOperation#collectPartitionableStates}}, subtask-0 would be 
assigned the managed-operatorstate but subtask-1 get none.
 # We should only redistribute union state and not touch the even-split state. 
Redistribute even-split state would cause unexpected behavior after 
{{RestartPipelinedRegionStrategy}} supported to restore state.

We should fix the above two problems and this issue is a prerequisite of 
FLINK-10712 and FLINK-10713 .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-14 Thread Aljoscha Krettek
While we’re on operators and tasks, I think it would also make sense in the 
long run to move the logic that is now in 
AbstractStreamOperator.setup()/initializeState()/snapshot()/snapshotState()(and 
the other snapshotState()…)/dispose() outside of the operator itself. This 
logic is the same for every operator but shouldn’t really be in there. We 
currently have a very complicated dance between the StreamTask and 
AbstractStreamOperator for initialising the state backends that doesn’t really 
seem necessary.

> On 14. Feb 2019, at 11:54, Stephan Ewen  wrote:
> 
> To move this forward, would suggest the following:
> 
>  - Let's quickly check which other classes need to change. I assume the
> TwoInputStreamTask and StreamTwoInputProcessor ?
>  - Can those changes be new classes that are used when the new operator is
> used? The current TwoInputStreamTask and StreamTwoInputProcessor remain
> until they are fully subsumed and are then removed.
> 
>  - Do we need and other refactorings before, like some cleanup of the
> Operator Config or the Operator Chain?
> 
> Best,
> Stephan
> 
> 
> On Sun, Feb 10, 2019 at 7:25 AM Guowei Ma  wrote:
> 
>> 2019.2.10
>> 
>> 
>> Hi,Stephan
>> 
>> 
>> Thank you very much for such detailed and constructive comments.
>> 
>> 
>> *binary vs. n-ary* and *enum vs. integer*
>> 
>> 
>> Considering the N-ary, as you mentioned, using integers may be a better
>> choice.
>> 
>> 
>> *generic selectable interface*
>> 
>> 
>> You are right. This interface can be removed.
>> 
>> 
>> *end-input*
>> 
>> It is true that the Operator does not need to store the end-input state,
>> which can be inferred by the system and notify the Operator at the right
>> time. We can consider using this mechanism when the system can checkpoint
>> the topology with the Finish Tasks.
>> 
>> 
>> *early-out*
>> 
>> It is reasonable for me not to consider this situation at present.
>> 
>> 
>> *distributed stream deadlocks*
>> 
>> 
>> At present, there is no deadlock for the streaming, but I think it might
>> be  still necessary to do some validation(Warning or Reject) in JobGraph.
>> Because once Flink introduces this TwoInputSelectable interface, the user
>> of the streaming would also construct a diamond-style topology that may be
>> deadlocked.
>> 
>> 
>> *empty input / selection timeout*
>> 
>> It is reasonable for me not to consider this situation at present.
>> 
>> 
>> *timers*
>> 
>> When all the inputs are finished, TimeService will wait until all timers
>> are triggered. So there should be no problem. I and others guys are
>> confirming the details to see if there are other considerations
>> 
>> 
>> Best
>> 
>> GuoWei
>> 
>> Stephan Ewen  于2019年2月8日周五 下午7:56写道:
>> 
>>> Nice design proposal, and +1 to the general idea.
>>> 
>>> A few thoughts / suggestions:
>>> 
>>> *binary vs. n-ary*
>>> 
>>> I would plan ahead for N-ary operators. Not because we necessarily need
>>> n-ary inputs (one can probably build that purely in the API) but because
>> of
>>> future side inputs. The proposal should be able to handle that as well.
>>> 
>>> *enum vs. integer*
>>> 
>>> The above might be easier is to realize when going directly with integer
>>> and having ANY, FIRST, SECOND, etc. as pre-defined constants.
>>> Performance wise, it is probably not difference whether to use int or
>> enum.
>>> 
>>> *generic selectable interface*
>>> 
>>> From the proposal, I don't understand quite what that interface is for.
>> My
>>> understanding is that the input processor or task that calls the
>>> operators's functions would anyways work on the TwoInputStreamOperator
>>> interface, for efficiency.
>>> 
>>> *end-input*
>>> 
>>> I think we should not make storing the end-input the operator's
>>> responsibility
>>> There is a simple way to handle this, which is also consistent with other
>>> aspects of handling finished tasks:
>>> 
>>>  - If a task is finished, that should be stored in the checkpoint.
>>> - Upon restoring a finished task, if it has still running successors, we
>>> deploy a "finished input channel", which immediately send the "end of
>>> input" when task is started.
>>> - the operator will hence set the end of input immediately again upon
>>> 
>>> *early-out*
>>> 
>>> Letting nextSelection() return “NONE” or “FINISHED" may be relevant for
>>> early-out cases, but I would remove this from the scope of this proposal.
>>> There are most likely other big changes involved, like communicating this
>>> to the upstream operators.
>>> 
>>> *distributed stream deadlocks*
>>> 
>>> We had this issue in the DataSet API. Earlier versions of the DataSet API
>>> made an analysis of the flow detecting dams and whether the pipeline
>>> breaking behavior in the flow would cause deadlocks, and introduce
>>> artificial pipeline breakers in response.
>>> 
>>> The logic was really complicated and it took a while to become stable. We
>>> had several issues that certain user functions (like mapPartition) could
>>> either be pipelined or have a full 

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-02-14 Thread Jark Wu
Hi all,

Qianjing Xu and I have been exploring Docusaurus and crowdin in recent days
to see how can Flink fit into them to support multi-language.

Here is the summary from our side:
(we didn't learn deeply, so we may miss something or misunderstand. Please
correct us if we are wrong)

# What is Docusaurus and crowdin ?
Docusaurus [1] is a documentation framework which supports document
versioning and localization (via crowdin). IMO, Docusaurus is something
similar to Jekyll.
Crowdin[2] is a localization management platform. Users can upload contents
(e.g. markdown source files) to crowdin and translate, collaborate, manage
process on crowdin.
The English contents is kept in the original repository, and the multiple
language translated contents is kept in crowdin. We need to download the
translated contents from crowdin and build them into localization website.
Apache Pulsar is using them for website and documentation as @Sijie
mentioned above.
Here is the Pulsar project on crowdin [3].
And here is a test project for Flink I created [4].

# How can Flink fit into them?
I'm afraid that Flink is hard to fit into Docusaurus unless we rework our
website/docs from Jekyll to Docusaurus.

How about Jekyll + Crowdin?
We need a build job to make it work. The build job is triggered when a
commit merged into master.
The build job does the following things:
1) upload the lastest contents (English markdown source files) from git to
crowdin.
- If the source content is changed, the corresponding translation will
lose and need re-translation.
2) download the translated contents
3) build website and publish

But it seems that Crowdin doesn't fit well with Jekyll.
Crowdin will break contents into multiple lines to translate according his
syntax. This results to the layout broken.
For example, the translated metric page is not rendered as expect:
this is the original `metric.md`:
https://user-images.githubusercontent.com/5378924/52795366-9a22f080-30ac-11e9-9cfd-4de82041aa77.png
this is the file downloaded from crowdin:
https://user-images.githubusercontent.com/5378924/52795379-9f803b00-30ac-11e9-9700-0a4077b5882d.png
this is the page after rendered:
https://user-images.githubusercontent.com/5378924/52795389-a5761c00-30ac-11e9-9821-35c705a8d65b.png

So it seems that currently crowdin works less well when the markdown
contains HTML and Liquid codes.

# Conclusion (Docusaurus/crowdin or previous proposal)
I'm leaning more towards the previous proposal [5]. Not only because
crowdin works less well for Flink currently.
But also it introduces much new things and tools to learn for contributors.
Furthermore the previous proposed translate process works good when we
experiment it in Flink website.

What do you think?

Regards,
Jark

[1]: https://docusaurus.io/en/
[2]: https://crowdin.com/
[3]: https://crowdin.com/project/apache-pulsar/zh-CN#
[4]: https://crowdin.com/project/flink-test/zh-CN#
[5]:
https://docs.google.com/document/d/1R1-uDq-KawLB8afQYrczfcoQHjjIhq6tvUksxrfhBl0/edit#


On Tue, 12 Feb 2019 at 23:48, Xingcan Cui  wrote:

> Hi,
>
> I agree with the proposal, Gordon and Jark, and I think it's a good
> solution for major doc changes. We also created separated JIRAs for English
> documentation in the past.
>
> For minor doc changes, I think it’s better to encourage Chinese-speaking
> contributors to participate in the reviewing process and help translate the
> few lines in sync.
>
> Best,
> Xingcan
>
> > On Feb 12, 2019, at 7:04 AM, Jark Wu  wrote:
> >
> > Hi @Sijie,
> >
> > Thank you for the valuable information. I will explore Docusaurus and
> > feedback here.
> >
> > Best,
> > Jark
> >
> > On Tue, 12 Feb 2019 at 18:18, Fabian Hueske  wrote:
> >
> >> Hi everyone,
> >>
> >> +1 to what Gordon and Jark proposed.
> >> We should make use of the review bot to ensure that new features are
> >> documented at least in English.
> >> If the Chinese docs are not updated, a Jira issue must be created.
> >>
> >> @Sijie, thank for the pointer to Docusaurus! IMO, this looks very
> >> interesting and should be worth exploring.
> >>
> >> Thanks, Fabian
> >>
> >>
> >>
> >> Am Di., 12. Feb. 2019 um 09:06 Uhr schrieb Sijie Guo  >:
> >>
> >>> Hi,
> >>>
> >>> Sorry for interrupting the thread. But this topic sounds interesting to
> >> me.
> >>> It might be worth for you guys checking out docusaurus
> >>> https://docusaurus.io/docs/en/translation . Docusaurus is a
> >> documentation
> >>> framework open sourced by Facebook. It has handled versioning and
> >>> localization (via crowdin) pretty good. Apache Pulsar is using it for
> its
> >>> website and documentation.
> >>>
> >>> - Sijie
> >>>
> >>> On Mon, Jan 28, 2019 at 7:59 PM Jark Wu  wrote:
> >>>
>  Hi all,
> 
>  In the past year, the Chinese community is working on building a
> >> Chinese
>  translated Flink website (http://flink.apache.org) and documents (
>  http://ci.apache.org/projects/flink/flink-docs-master/) in order to
> >> help
>  Chinese speaking users. 

apply to become the contributor -- dev@flink.apache.org

2019-02-14 Thread 戴雄军
Hi Guys,

I want to contribute to Apache Flink.
Would you please give methe permission as a contributor?

My JIRA ID iscellen.


thanks !





|
姓名
qqq...@163.com
公司名称:税友集团北京分公司
地址:北京市海淀区中关村南大街甲12号寰太大厦
电话:
手机:15652685890
|

扫描该二维码,可以将电子名片迅速保存到手机 使用帮助

|



At 2019-02-14 22:40:37, dev-h...@flink.apache.org wrote:
>Hi! This is the ezmlm program. I'm managing the
>dev@flink.apache.org mailing list.
>
>I'm working for my owner, who can be reached
>at dev-ow...@flink.apache.org.
>
>Acknowledgment: I have added the address
>
>   qqq...@163.com
>
>to the dev mailing list.
>
>Welcome to dev@flink.apache.org!
>
>Please save this message so that you know the address you are
>subscribed under, in case you later want to unsubscribe or change your
>subscription address.
>
>
>--- Administrative commands for the dev list ---
>
>I can handle administrative requests automatically. Please
>do not send them to the list address! Instead, send
>your message to the correct command address:
>
>To subscribe to the list, send a message to:
>   
>
>To remove your address from the list, send a message to:
>   
>
>Send mail to the following for info and FAQ for this list:
>   
>   
>
>Similar addresses exist for the digest list:
>   
>   
>
>To get messages 123 through 145 (a maximum of 100 per request), mail:
>   
>
>To get an index with subject and author for messages 123-456 , mail:
>   
>
>They are always returned as sets of 100, max 2000 per request,
>so you'll actually get 100-499.
>
>To receive all messages with the same subject as message 12345,
>send a short message to:
>   
>
>The messages should contain one line or word of text to avoid being
>treated as sp@m, but I will ignore their content.
>Only the ADDRESS you send to is important.
>
>You can start a subscription for an alternate address,
>for example "john@host.domain", just add a hyphen and your
>address (with '=' instead of '@') after the command word:
>
>
>To stop subscription for this address, mail:
>
>
>In both cases, I'll send a confirmation message to that address. When
>you receive it, simply reply to it to complete your subscription.
>
>If despite following these instructions, you do not get the
>desired results, please contact my owner at
>dev-ow...@flink.apache.org. Please be patient, my owner is a
>lot slower than I am ;-)
>
>--- Enclosed is a copy of the request I received.
>
>Return-Path: 
>Received: (qmail 84957 invoked by uid 99); 14 Feb 2019 14:40:37 -
>Received: from pnap-us-west-generic-nat.apache.org (HELO 
>spamd4-us-west.apache.org) (209.188.14.142)
>by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Feb 2019 14:40:37 +
>Received: from localhost (localhost [127.0.0.1])
>   by spamd4-us-west.apache.org (ASF Mail Server at 
> spamd4-us-west.apache.org) with ESMTP id 67826C055E
>   for 
> ; 
> Thu, 14 Feb 2019 14:40:37 + (UTC)
>X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
>X-Spam-Flag: NO
>X-Spam-Score: 2.531
>X-Spam-Level: **
>X-Spam-Status: No, score=2.531 tagged_above=-999 required=6.31
>   tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
>   DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FROM_EXCESS_BASE64=0.105,
>   HK_RANDOM_ENVFROM=0.626, HTML_MESSAGE=2, KAM_SHORT=0.001,
>   SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled
>Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
>   dkim=pass (1024-bit key) header.d=163.com
>Received: from mx1-lw-us.apache.org ([10.40.0.8])
>   by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, 
> port 10024)
>   with ESMTP id epBkvvFggLMi
>   for 
> ;
>   Thu, 14 Feb 2019 14:40:32 + (UTC)
>Received: from m13-101.163.com (m13-101.163.com [220.181.13.101])
>   by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with 
> ESMTP id 6877C5F404
>   for 
> ; 
> Thu, 14 Feb 2019 14:40:24 + (UTC)
>DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com;
>   s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=+EQNp
>   CyZmdEwNFy5JAaYIEUZGendY1iiIgjDAA7UiG4=; b=PzOwQez8OzwKTrao8Z5+3
>   5x1WJFiBcSh0KZIMEC0VQCIyPvENQNlqfUmdCdFSdQA58FJLA94X2EyrblKoITLJ
>   2gqYYhYnBrnPowH6GHyHJg1q7gb8o/jzh2IVKoH8k22+1SPRJyEYVr5mP73cJ7Pl
>   bDNmVDWRSdhcNkqzCmO/HA=
>Received: from qqqdxj$163.com ( [114.249.219.227] ) by ajax-webmail-wmsvr101
> (Coremail) ; Thu, 14 Feb 2019 22:40:10 +0800 (CST)
>X-Originating-IP: [114.249.219.227]
>Date: Thu, 14 Feb 2019 22:40:10 +0800 (CST)
>From: =?GBK?B?tPfQ2778?= 
>To: dev-sc.1550042124.mpbhopeeeobmcaaklpem-qqqdxj=163@flink.apache.org
>Subject: Re:confirm subscribe to dev@flink.apache.org
>X-Priority: 3
>X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build
> 20180927(cd7136b6) Copyright (c) 2002-2019 www.mailtech.cn 163com
>In-Reply-To: <1550042124.40760.ez...@flink.apache.org>
>References: <1550042124.40760.ez...@flink.apache.org>
>X-CM-CTRLDATA: YIqnt2Zvb3Rlcl9odG09OTEzNDo1Ng==
>Content-Type: 

[jira] [Created] (FLINK-11617) CLONE - Handle AmazonKinesisException gracefully in Kinesis Streaming Connector

2019-02-14 Thread Jamie Grier (JIRA)
Jamie Grier created FLINK-11617:
---

 Summary: CLONE - Handle AmazonKinesisException gracefully in 
Kinesis Streaming Connector
 Key: FLINK-11617
 URL: https://issues.apache.org/jira/browse/FLINK-11617
 Project: Flink
  Issue Type: Improvement
  Components: Kinesis Connector
Reporter: Jamie Grier
Assignee: Scott Kidder


My Flink job that consumes from a Kinesis stream must be restarted at least 
once daily due to an uncaught AmazonKinesisException when reading from Kinesis. 
The complete stacktrace looks like:

{noformat}
com.amazonaws.services.kinesis.model.AmazonKinesisException: null (Service: 
AmazonKinesis; Status Code: 500; Error Code: InternalFailure; Request ID: 
dc1b7a1a-1b97-1a32-8cd5-79a896a55223)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1545)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1183)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:964)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:676)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:650)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:633)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:601)
at 
com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:583)
at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:447)
at 
com.amazonaws.services.kinesis.AmazonKinesisClient.doInvoke(AmazonKinesisClient.java:1747)
at 
com.amazonaws.services.kinesis.AmazonKinesisClient.invoke(AmazonKinesisClient.java:1723)
at 
com.amazonaws.services.kinesis.AmazonKinesisClient.getRecords(AmazonKinesisClient.java:858)
at 
org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.getRecords(KinesisProxy.java:193)
at 
org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.getRecords(ShardConsumer.java:268)
at 
org.apache.flink.streaming.connectors.kinesis.internals.ShardConsumer.run(ShardConsumer.java:176)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}

It's interesting that the Kinesis endpoint returned a 500 status code, but 
that's outside the scope of this issue.

I think we can handle this exception in the same manner as a 
ProvisionedThroughputException: performing an exponential backoff and retrying 
a finite number of times before throwing an exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11616) Flink official document has an error

2019-02-14 Thread xulinjie (JIRA)
xulinjie created FLINK-11616:


 Summary: Flink official document has an error
 Key: FLINK-11616
 URL: https://issues.apache.org/jira/browse/FLINK-11616
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: xulinjie
Assignee: xulinjie


The page url is 
[https://ci.apache.org/projects/flink/flink-docs-master/tutorials/flink_on_windows.html]

The mistake is in paragraph “Installing Flink from Git”.

“The solution is to adjust the Cygwin settings to deal with the correct line 
endings by following these three steps:”,

The sequence of steps you wrote was "1, 2, 1".But I think you might want to 
write "1, 2, 3".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Flink client api enhancement for downstream project

2019-02-14 Thread Stephan Ewen
Nice that this discussion is happening.

In the FLIP, we could also revisit the entire role of the environments
again.

Initially, the idea was:
  - the environments take care of the specific setup for standalone (no
setup needed), yarn, mesos, etc.
  - the session ones have control over the session. The environment holds
the session client.
  - running a job gives a "control" object for that job. That behavior is
the same in all environments.

The actual implementation diverged quite a bit from that. Happy to see a
discussion about straitening this out a bit more.


On Tue, Feb 12, 2019 at 4:58 AM Jeff Zhang  wrote:

> Hi folks,
>
> Sorry for late response, It seems we reach consensus on this, I will create
> FLIP for this with more detailed design
>
>
> Thomas Weise  于2018年12月21日周五 上午11:43写道:
>
> > Great to see this discussion seeded! The problems you face with the
> > Zeppelin integration are also affecting other downstream projects, like
> > Beam.
> >
> > We just enabled the savepoint restore option in RemoteStreamEnvironment
> [1]
> > and that was more difficult than it should be. The main issue is that
> > environment and cluster client aren't decoupled. Ideally it should be
> > possible to just get the matching cluster client from the environment and
> > then control the job through it (environment as factory for cluster
> > client). But note that the environment classes are part of the public
> API,
> > and it is not straightforward to make larger changes without breaking
> > backward compatibility.
> >
> > ClusterClient currently exposes internal classes like JobGraph and
> > StreamGraph. But it should be possible to wrap this with a new public API
> > that brings the required job control capabilities for downstream
> projects.
> > Perhaps it is helpful to look at some of the interfaces in Beam while
> > thinking about this: [2] for the portable job API and [3] for the old
> > asynchronous job control from the Beam Java SDK.
> >
> > The backward compatibility discussion [4] is also relevant here. A new
> API
> > should shield downstream projects from internals and allow them to
> > interoperate with multiple future Flink versions in the same release line
> > without forced upgrades.
> >
> > Thanks,
> > Thomas
> >
> > [1] https://github.com/apache/flink/pull/7249
> > [2]
> >
> >
> https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto
> > [3]
> >
> >
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/PipelineResult.java
> > [4]
> >
> >
> https://lists.apache.org/thread.html/064c75c5d10f0806095b14f6d76942598917a14429c1acbddd151fe2@%3Cdev.flink.apache.org%3E
> >
> >
> > On Thu, Dec 20, 2018 at 6:15 PM Jeff Zhang  wrote:
> >
> > > >>> I'm not so sure whether the user should be able to define where the
> > job
> > > runs (in your example Yarn). This is actually independent of the job
> > > development and is something which is decided at deployment time.
> > >
> > > User don't need to specify execution mode programmatically. They can
> also
> > > pass the execution mode from the arguments in flink run command. e.g.
> > >
> > > bin/flink run -m yarn-cluster 
> > > bin/flink run -m local ...
> > > bin/flink run -m host:port ...
> > >
> > > Does this make sense to you ?
> > >
> > > >>> To me it makes sense that the ExecutionEnvironment is not directly
> > > initialized by the user and instead context sensitive how you want to
> > > execute your job (Flink CLI vs. IDE, for example).
> > >
> > > Right, currently I notice Flink would create different
> > > ContextExecutionEnvironment based on different submission scenarios
> > (Flink
> > > Cli vs IDE). To me this is kind of hack approach, not so
> straightforward.
> > > What I suggested above is that is that flink should always create the
> > same
> > > ExecutionEnvironment but with different configuration, and based on the
> > > configuration it would create the proper ClusterClient for different
> > > behaviors.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Till Rohrmann  于2018年12月20日周四 下午11:18写道:
> > >
> > > > You are probably right that we have code duplication when it comes to
> > the
> > > > creation of the ClusterClient. This should be reduced in the future.
> > > >
> > > > I'm not so sure whether the user should be able to define where the
> job
> > > > runs (in your example Yarn). This is actually independent of the job
> > > > development and is something which is decided at deployment time. To
> me
> > > it
> > > > makes sense that the ExecutionEnvironment is not directly initialized
> > by
> > > > the user and instead context sensitive how you want to execute your
> job
> > > > (Flink CLI vs. IDE, for example). However, I agree that the
> > > > ExecutionEnvironment should give you access to the ClusterClient and
> to
> > > the
> > > > job (maybe in the form of the JobGraph or a job plan).
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Dec 

Re: [DISCUSS] Releasing Flink 1.6.4

2019-02-14 Thread jincheng sun
Thanks for the feedback of all of you. And thanks for your help Robert!
The release of 1.6.4 is very welcome with the community, I'll proceed to
create the first release candidate for 1.6.4 now.

Best,
Jincheng

Robert Metzger  于2019年2月13日周三 下午6:33写道:

> Can we start creating the release candidate for the 1.6.4 version, or are
> there still commits in the "release-1.6" branch missing?
>
> On Wed, Feb 13, 2019 at 9:41 AM jing  wrote:
>
> > +1
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
> >
>


Re: Problems with local build

2019-02-14 Thread Александр
response*, sorry

чт, 14 февр. 2019 г. в 14:42, Александр :

> Hello. Thanks for your responsible.
> I use 3.5.2 version of maven, but also i tried 3.2.5 (as guide
> recommended) but it has no effect.
> I try to build 5e8e00b46369cfcbe0ccd8eb664da07ce3c9d1bf  -it's a last
> commit in master branch.
>
> чт, 14 февр. 2019 г. в 13:32, Robert Metzger :
>
>> Hey Aleksandr,
>>
>> let's first try to resolve issue 1.
>>
>> Which maven version are you using?
>> What's the commit sha you are trying to build?
>>
>


Re: Problems with local build

2019-02-14 Thread Александр
Hello. Thanks for your responsible.
I use 3.5.2 version of maven, but also i tried 3.2.5 (as guide recommended)
but it has no effect.
I try to build 5e8e00b46369cfcbe0ccd8eb664da07ce3c9d1bf  -it's a last
commit in master branch.

чт, 14 февр. 2019 г. в 13:32, Robert Metzger :

> Hey Aleksandr,
>
> let's first try to resolve issue 1.
>
> Which maven version are you using?
> What's the commit sha you are trying to build?
>


Re: [DISCUSS] Enhance Operator API to Support Dynamically Selective Reading and EndOfInput Event

2019-02-14 Thread Stephan Ewen
To move this forward, would suggest the following:

  - Let's quickly check which other classes need to change. I assume the
TwoInputStreamTask and StreamTwoInputProcessor ?
  - Can those changes be new classes that are used when the new operator is
used? The current TwoInputStreamTask and StreamTwoInputProcessor remain
until they are fully subsumed and are then removed.

  - Do we need and other refactorings before, like some cleanup of the
Operator Config or the Operator Chain?

Best,
Stephan


On Sun, Feb 10, 2019 at 7:25 AM Guowei Ma  wrote:

> 2019.2.10
>
>
> Hi,Stephan
>
>
> Thank you very much for such detailed and constructive comments.
>
>
> *binary vs. n-ary* and *enum vs. integer*
>
>
> Considering the N-ary, as you mentioned, using integers may be a better
> choice.
>
>
> *generic selectable interface*
>
>
> You are right. This interface can be removed.
>
>
> *end-input*
>
> It is true that the Operator does not need to store the end-input state,
> which can be inferred by the system and notify the Operator at the right
> time. We can consider using this mechanism when the system can checkpoint
> the topology with the Finish Tasks.
>
>
> *early-out*
>
> It is reasonable for me not to consider this situation at present.
>
>
> *distributed stream deadlocks*
>
>
> At present, there is no deadlock for the streaming, but I think it might
> be  still necessary to do some validation(Warning or Reject) in JobGraph.
> Because once Flink introduces this TwoInputSelectable interface, the user
> of the streaming would also construct a diamond-style topology that may be
> deadlocked.
>
>
> *empty input / selection timeout*
>
> It is reasonable for me not to consider this situation at present.
>
>
> *timers*
>
> When all the inputs are finished, TimeService will wait until all timers
> are triggered. So there should be no problem. I and others guys are
> confirming the details to see if there are other considerations
>
>
> Best
>
> GuoWei
>
> Stephan Ewen  于2019年2月8日周五 下午7:56写道:
>
> > Nice design proposal, and +1 to the general idea.
> >
> > A few thoughts / suggestions:
> >
> > *binary vs. n-ary*
> >
> > I would plan ahead for N-ary operators. Not because we necessarily need
> > n-ary inputs (one can probably build that purely in the API) but because
> of
> > future side inputs. The proposal should be able to handle that as well.
> >
> > *enum vs. integer*
> >
> > The above might be easier is to realize when going directly with integer
> > and having ANY, FIRST, SECOND, etc. as pre-defined constants.
> > Performance wise, it is probably not difference whether to use int or
> enum.
> >
> > *generic selectable interface*
> >
> > From the proposal, I don't understand quite what that interface is for.
> My
> > understanding is that the input processor or task that calls the
> > operators's functions would anyways work on the TwoInputStreamOperator
> > interface, for efficiency.
> >
> > *end-input*
> >
> > I think we should not make storing the end-input the operator's
> > responsibility
> > There is a simple way to handle this, which is also consistent with other
> > aspects of handling finished tasks:
> >
> >   - If a task is finished, that should be stored in the checkpoint.
> >  - Upon restoring a finished task, if it has still running successors, we
> > deploy a "finished input channel", which immediately send the "end of
> > input" when task is started.
> >  - the operator will hence set the end of input immediately again upon
> >
> > *early-out*
> >
> > Letting nextSelection() return “NONE” or “FINISHED" may be relevant for
> > early-out cases, but I would remove this from the scope of this proposal.
> > There are most likely other big changes involved, like communicating this
> > to the upstream operators.
> >
> > *distributed stream deadlocks*
> >
> > We had this issue in the DataSet API. Earlier versions of the DataSet API
> > made an analysis of the flow detecting dams and whether the pipeline
> > breaking behavior in the flow would cause deadlocks, and introduce
> > artificial pipeline breakers in response.
> >
> > The logic was really complicated and it took a while to become stable. We
> > had several issues that certain user functions (like mapPartition) could
> > either be pipelined or have a full dam (not possible to know for the
> > system), so we had to insert artificial pipeline breakers in all paths.
> >
> > In the end we simply decided that in the case of a diamond-style flow, we
> > make the point where the flow first forks as blocking shuffle. That was
> > super simple, solved all issues, and has the additional nice property
> that
> > it great point to materialize data for recovery, because it helps both
> > paths of the diamond upon failure.
> >
> > My suggestion:
> > ==> For streaming, no problem so far, nothing to do
> > ==> For batch, would suggest to go with the simple solution described
> above
> > first, and improve when we see cases where this impacts performance
> > 

Re: Problems with local build

2019-02-14 Thread Robert Metzger
Hey Aleksandr,

let's first try to resolve issue 1.

Which maven version are you using?
What's the commit sha you are trying to build?


Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Stephan Ewen
I think the website is better as well.

I agree with Fabian that the wiki is not so visible, and visibility is the
main motivation.
This type of roadmap overview would not be updated by everyone - letting
committers update the roadmap means the listed threads are actually
happening at the moment.


On Thu, Feb 14, 2019 at 11:14 AM Fabian Hueske  wrote:

> Hi,
>
> I like the idea of putting the roadmap on the website because it is much
> more visible (and IMO more credible, obligatory) there.
> However, I share the concerns about frequent updates.
>
> It think it would be great to update the "official" roadmap on the website
> once per release (-bugfix releases), i.e., every three month.
> We can use the wiki to collect and draft the roadmap for the next update.
>
> Best, Fabian
>
>
> Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang :
>
>> Hi Stephan,
>>
>> Thanks for this proposal. It is a good idea to track the roadmap. One
>> suggestion is that it might be better to put it into wiki page first.
>> Because it is easier to update the roadmap on wiki compared to on flink web
>> site. And I guess we may need to update the roadmap very often at the
>> beginning as there's so many discussions and proposals in community
>> recently. We can move it into flink web site later when we feel it could be
>> nailed down.
>>
>> Stephan Ewen  于2019年2月14日周四 下午5:44写道:
>>
>>> Thanks Jincheng and Rong Rong!
>>>
>>> I am not deciding a roadmap and making a call on what features should be
>>> developed or not. I was only collecting broader issues that are already
>>> happening or have an active FLIP/design discussion plus committer support.
>>>
>>> Do we have that for the suggested issues as well? If yes , we can add
>>> them (can you point me to the issue/mail-thread), if not, let's try and
>>> move the discussion forward and add them to the roadmap overview then.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>>>
 Thanks Stephan for the great proposal.

 This would not only be beneficial for new users but also for
 contributors to keep track on all upcoming features.

 I think that better window operator support can also be separately
 group into its own category, as they affects both future DataStream API and
 batch stream unification.
 can we also include:
 - OVER aggregate for DataStream API separately as @jincheng suggested.
 - Improving sliding window operator [1]

 One more additional suggestion, can we also include a more extendable
 security module [2,3] @shuyi and I are currently working on?
 This will significantly improve the usability for Flink in corporate
 environments where proprietary or 3rd-party security integration is needed.

 Thanks,
 Rong


 [1]
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
 [2]
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
 [3]
 http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html




 On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
 wrote:

> Very excited and thank you for launching such a great discussion,
> Stephan !
>
> Here only a little suggestion that in the Batch Streaming Unification
> section, do we need to add an item:
>
> - Same window operators on bounded/unbounded Table API and DataStream
> API
> (currently OVER window only exists in SQL/TableAPI, DataStream API
> does not yet support)
>
> Best,
> Jincheng
>
> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>
>> Hi all!
>>
>> Recently several contributors, committers, and users asked about
>> making it more visible in which way the project is currently going.
>>
>> Users and developers can track the direction by following the
>> discussion threads and JIRA, but due to the mass of discussions and open
>> issues, it is very hard to get a good overall picture.
>> Especially for new users and contributors, is is very hard to get a
>> quick overview of the project direction.
>>
>> To fix this, I suggest to add a brief roadmap summary to the
>> homepage. It is a bit of a commitment to keep that roadmap up to date, 
>> but
>> I think the benefit for users justifies that.
>> The Apache Beam project has added such a roadmap [1]
>> , which was received very well by
>> the community, I would suggest to follow a similar structure here.
>>
>> If the community is in favor of this, I would volunteer to write a
>> first version of such a roadmap. The points I would include are below.
>>
>> Best,
>> Stephan
>>
>> [1] https://beam.apache.org/roadmap/
>>
>> 

want to be a Flink contributor

2019-02-14 Thread 金 钟镛
Hey Guys,

I want to be a contributor to flink, could you give me the premission?
My JIRA ID is AT-Fieldless


Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Fabian Hueske
Hi,

I like the idea of putting the roadmap on the website because it is much
more visible (and IMO more credible, obligatory) there.
However, I share the concerns about frequent updates.

It think it would be great to update the "official" roadmap on the website
once per release (-bugfix releases), i.e., every three month.
We can use the wiki to collect and draft the roadmap for the next update.

Best, Fabian


Am Do., 14. Feb. 2019 um 11:03 Uhr schrieb Jeff Zhang :

> Hi Stephan,
>
> Thanks for this proposal. It is a good idea to track the roadmap. One
> suggestion is that it might be better to put it into wiki page first.
> Because it is easier to update the roadmap on wiki compared to on flink web
> site. And I guess we may need to update the roadmap very often at the
> beginning as there's so many discussions and proposals in community
> recently. We can move it into flink web site later when we feel it could be
> nailed down.
>
> Stephan Ewen  于2019年2月14日周四 下午5:44写道:
>
>> Thanks Jincheng and Rong Rong!
>>
>> I am not deciding a roadmap and making a call on what features should be
>> developed or not. I was only collecting broader issues that are already
>> happening or have an active FLIP/design discussion plus committer support.
>>
>> Do we have that for the suggested issues as well? If yes , we can add
>> them (can you point me to the issue/mail-thread), if not, let's try and
>> move the discussion forward and add them to the roadmap overview then.
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>>
>>> Thanks Stephan for the great proposal.
>>>
>>> This would not only be beneficial for new users but also for
>>> contributors to keep track on all upcoming features.
>>>
>>> I think that better window operator support can also be separately group
>>> into its own category, as they affects both future DataStream API and batch
>>> stream unification.
>>> can we also include:
>>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>>> - Improving sliding window operator [1]
>>>
>>> One more additional suggestion, can we also include a more extendable
>>> security module [2,3] @shuyi and I are currently working on?
>>> This will significantly improve the usability for Flink in corporate
>>> environments where proprietary or 3rd-party security integration is needed.
>>>
>>> Thanks,
>>> Rong
>>>
>>>
>>> [1]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>>> [2]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>>> [3]
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>>
>>>
>>>
>>>
>>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
>>> wrote:
>>>
 Very excited and thank you for launching such a great discussion,
 Stephan !

 Here only a little suggestion that in the Batch Streaming Unification
 section, do we need to add an item:

 - Same window operators on bounded/unbounded Table API and DataStream
 API
 (currently OVER window only exists in SQL/TableAPI, DataStream API does
 not yet support)

 Best,
 Jincheng

 Stephan Ewen  于2019年2月13日周三 下午7:21写道:

> Hi all!
>
> Recently several contributors, committers, and users asked about
> making it more visible in which way the project is currently going.
>
> Users and developers can track the direction by following the
> discussion threads and JIRA, but due to the mass of discussions and open
> issues, it is very hard to get a good overall picture.
> Especially for new users and contributors, is is very hard to get a
> quick overview of the project direction.
>
> To fix this, I suggest to add a brief roadmap summary to the homepage.
> It is a bit of a commitment to keep that roadmap up to date, but I think
> the benefit for users justifies that.
> The Apache Beam project has added such a roadmap [1]
> , which was received very well by
> the community, I would suggest to follow a similar structure here.
>
> If the community is in favor of this, I would volunteer to write a
> first version of such a roadmap. The points I would include are below.
>
> Best,
> Stephan
>
> [1] https://beam.apache.org/roadmap/
>
> 
>
> Disclaimer: Apache Flink is not governed or steered by any one single
> entity, but by its community and Project Management Committee (PMC). This
> is not a authoritative roadmap in the sense of a plan with a specific
> timeline. Instead, we share our vision for the future and major 
> initiatives
> that are receiving attention and give users and contributors an
> understanding what they can look 

[jira] [Created] (FLINK-11615) Webfrontend is not accessible in 1.7.x

2019-02-14 Thread Khanh Nguyen (JIRA)
Khanh Nguyen created FLINK-11615:


 Summary: Webfrontend is not accessible in 1.7.x
 Key: FLINK-11615
 URL: https://issues.apache.org/jira/browse/FLINK-11615
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: 1.7.1, 1.7.0
Reporter: Khanh Nguyen


I have been using Web monitor dashboard with FlinkMiniCluster on version 1.4.x, 
1.5.x and 1.6.x without any problem. But trying to access web frontend at port 
8081 on 1.7.x returns 
{code:java}
{"errors":["Not found."]}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Jeff Zhang
Hi Stephan,

Thanks for this proposal. It is a good idea to track the roadmap. One
suggestion is that it might be better to put it into wiki page first.
Because it is easier to update the roadmap on wiki compared to on flink web
site. And I guess we may need to update the roadmap very often at the
beginning as there's so many discussions and proposals in community
recently. We can move it into flink web site later when we feel it could be
nailed down.

Stephan Ewen  于2019年2月14日周四 下午5:44写道:

> Thanks Jincheng and Rong Rong!
>
> I am not deciding a roadmap and making a call on what features should be
> developed or not. I was only collecting broader issues that are already
> happening or have an active FLIP/design discussion plus committer support.
>
> Do we have that for the suggested issues as well? If yes , we can add them
> (can you point me to the issue/mail-thread), if not, let's try and move the
> discussion forward and add them to the roadmap overview then.
>
> Best,
> Stephan
>
>
> On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:
>
>> Thanks Stephan for the great proposal.
>>
>> This would not only be beneficial for new users but also for contributors
>> to keep track on all upcoming features.
>>
>> I think that better window operator support can also be separately group
>> into its own category, as they affects both future DataStream API and batch
>> stream unification.
>> can we also include:
>> - OVER aggregate for DataStream API separately as @jincheng suggested.
>> - Improving sliding window operator [1]
>>
>> One more additional suggestion, can we also include a more extendable
>> security module [2,3] @shuyi and I are currently working on?
>> This will significantly improve the usability for Flink in corporate
>> environments where proprietary or 3rd-party security integration is needed.
>>
>> Thanks,
>> Rong
>>
>>
>> [1]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>> [3]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>>
>>
>>
>>
>> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
>> wrote:
>>
>>> Very excited and thank you for launching such a great discussion,
>>> Stephan !
>>>
>>> Here only a little suggestion that in the Batch Streaming Unification
>>> section, do we need to add an item:
>>>
>>> - Same window operators on bounded/unbounded Table API and DataStream
>>> API
>>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>>> not yet support)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>>>
 Hi all!

 Recently several contributors, committers, and users asked about making
 it more visible in which way the project is currently going.

 Users and developers can track the direction by following the
 discussion threads and JIRA, but due to the mass of discussions and open
 issues, it is very hard to get a good overall picture.
 Especially for new users and contributors, is is very hard to get a
 quick overview of the project direction.

 To fix this, I suggest to add a brief roadmap summary to the homepage.
 It is a bit of a commitment to keep that roadmap up to date, but I think
 the benefit for users justifies that.
 The Apache Beam project has added such a roadmap [1]
 , which was received very well by
 the community, I would suggest to follow a similar structure here.

 If the community is in favor of this, I would volunteer to write a
 first version of such a roadmap. The points I would include are below.

 Best,
 Stephan

 [1] https://beam.apache.org/roadmap/

 

 Disclaimer: Apache Flink is not governed or steered by any one single
 entity, but by its community and Project Management Committee (PMC). This
 is not a authoritative roadmap in the sense of a plan with a specific
 timeline. Instead, we share our vision for the future and major initiatives
 that are receiving attention and give users and contributors an
 understanding what they can look forward to.

 *Future Role of Table API and DataStream API*
   - Table API becomes first class citizen
   - Table API becomes primary API for analytics use cases
   * Declarative, automatic optimizations
   * No manual control over state and timers
   - DataStream API becomes primary API for applications and data
 pipeline use cases
   * Physical, user controls data types, no magic or optimizer
   * Explicit control over state and time

 *Batch Streaming Unification*
   - Table API unification (environments) (FLIP-32)
   - 

Re: [DISCUSS] Adding a mid-term roadmap to the Flink website

2019-02-14 Thread Stephan Ewen
Thanks Jincheng and Rong Rong!

I am not deciding a roadmap and making a call on what features should be
developed or not. I was only collecting broader issues that are already
happening or have an active FLIP/design discussion plus committer support.

Do we have that for the suggested issues as well? If yes , we can add them
(can you point me to the issue/mail-thread), if not, let's try and move the
discussion forward and add them to the roadmap overview then.

Best,
Stephan


On Wed, Feb 13, 2019 at 6:47 PM Rong Rong  wrote:

> Thanks Stephan for the great proposal.
>
> This would not only be beneficial for new users but also for contributors
> to keep track on all upcoming features.
>
> I think that better window operator support can also be separately group
> into its own category, as they affects both future DataStream API and batch
> stream unification.
> can we also include:
> - OVER aggregate for DataStream API separately as @jincheng suggested.
> - Improving sliding window operator [1]
>
> One more additional suggestion, can we also include a more extendable
> security module [2,3] @shuyi and I are currently working on?
> This will significantly improve the usability for Flink in corporate
> environments where proprietary or 3rd-party security integration is needed.
>
> Thanks,
> Rong
>
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improvement-to-Flink-Window-Operator-with-Slicing-td25750.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Kerberos-Improvement-td25983.html
>
>
>
>
> On Wed, Feb 13, 2019 at 3:39 AM jincheng sun 
> wrote:
>
>> Very excited and thank you for launching such a great discussion, Stephan
>> !
>>
>> Here only a little suggestion that in the Batch Streaming Unification
>> section, do we need to add an item:
>>
>> - Same window operators on bounded/unbounded Table API and DataStream API
>> (currently OVER window only exists in SQL/TableAPI, DataStream API does
>> not yet support)
>>
>> Best,
>> Jincheng
>>
>> Stephan Ewen  于2019年2月13日周三 下午7:21写道:
>>
>>> Hi all!
>>>
>>> Recently several contributors, committers, and users asked about making
>>> it more visible in which way the project is currently going.
>>>
>>> Users and developers can track the direction by following the discussion
>>> threads and JIRA, but due to the mass of discussions and open issues, it is
>>> very hard to get a good overall picture.
>>> Especially for new users and contributors, is is very hard to get a
>>> quick overview of the project direction.
>>>
>>> To fix this, I suggest to add a brief roadmap summary to the homepage.
>>> It is a bit of a commitment to keep that roadmap up to date, but I think
>>> the benefit for users justifies that.
>>> The Apache Beam project has added such a roadmap [1]
>>> , which was received very well by the
>>> community, I would suggest to follow a similar structure here.
>>>
>>> If the community is in favor of this, I would volunteer to write a first
>>> version of such a roadmap. The points I would include are below.
>>>
>>> Best,
>>> Stephan
>>>
>>> [1] https://beam.apache.org/roadmap/
>>>
>>> 
>>>
>>> Disclaimer: Apache Flink is not governed or steered by any one single
>>> entity, but by its community and Project Management Committee (PMC). This
>>> is not a authoritative roadmap in the sense of a plan with a specific
>>> timeline. Instead, we share our vision for the future and major initiatives
>>> that are receiving attention and give users and contributors an
>>> understanding what they can look forward to.
>>>
>>> *Future Role of Table API and DataStream API*
>>>   - Table API becomes first class citizen
>>>   - Table API becomes primary API for analytics use cases
>>>   * Declarative, automatic optimizations
>>>   * No manual control over state and timers
>>>   - DataStream API becomes primary API for applications and data
>>> pipeline use cases
>>>   * Physical, user controls data types, no magic or optimizer
>>>   * Explicit control over state and time
>>>
>>> *Batch Streaming Unification*
>>>   - Table API unification (environments) (FLIP-32)
>>>   - New unified source interface (FLIP-27)
>>>   - Runtime operator unification & code reuse between DataStream / Table
>>>   - Extending Table API to make it convenient API for all analytical use
>>> cases (easier mix in of UDFs)
>>>   - Same join operators on bounded/unbounded Table API and DataStream API
>>>
>>> *Faster Batch (Bounded Streams)*
>>>   - Much of this comes via Blink contribution/merging
>>>   - Fine-grained Fault Tolerance on bounded data (Table API)
>>>   - Batch Scheduling on bounded data (Table API)
>>>   - External Shuffle Services Support on bounded streams
>>>   - Caching of intermediate results 

Re: [DISCUSS] Flink Kerberos Improvement

2019-02-14 Thread Stephan Ewen
Hi all!

A quick question: Is this a special case of the security improvements
proposed in this thread [1], or a separate proposal all together?

Stephan

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html

On Tue, Dec 18, 2018 at 8:06 PM Rong Rong  wrote:

> Hi Shuyi,
>
> Yes. I think the impersonation is a very much valid question! This can
> actually be considered as 2 questions as I stated in the doc.
> 1. In the doc I stated that impersonation should be implemented on the
> user-side code and should only invoke the cluster client as the actual user
> joe'.
> 2. However, since currently the cluster client assumes no impersonation at
> all, many of the code assumes that a fully authorized client can be
> instantiated with the same authority that the actual Flink cluster has.
> When impersonation is enabled, this might not be the case. For example, if
> impersonation is in place, most likely the cluster client running on joe's
> behalf will not, and should not have access to keytab file of 'joe'.
> Instead, a delegation token is used. Thus the second part of the doc is
> trying to address this issue.
>
> --
> Rong
>
> On Mon, Dec 17, 2018 at 11:41 PM Shuyi Chen  wrote:
>
> > Hi Rong, thanks a lot for the proposal. Currently, Flink assume the
> keytab
> > is located in a remote DFS. Pre-installing Keytabs statically in YARN
> node
> > local filesystem is a common approach, so I think we should support this
> > mode in Flink natively. As an optimazation to reduce the KDC access
> > frequency, we should also support method 3 (the DT approach) as discussed
> > in [1]. A question is that why do we need to implement impersonation in
> > Flink? I assume the superuser can do the impersonation for 'joe' and
> 'joe'
> > can then invoke Flink client to deploy the job. Thanks a lot.
> >
> > Shuyi
> >
> > [1]
> >
> >
> https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit
> >
> > On Mon, Dec 17, 2018 at 5:49 PM Rong Rong  wrote:
> >
> > > Hi All,
> > >
> > > We have been experimenting integration of Kerberos with Flink in our
> Corp
> > > environment and found out some limitations on the current
> Flink-Kerberos
> > > security mechanism running with Apache YARN.
> > >
> > > Based on the Hadoop Kerberos security guide [1]. Apparently there are
> > only
> > > a subset of the suggested long-running service security mechanism is
> > > supported in Flink. Furthermore, the current model does not work well
> > with
> > > superuser impersonating actual users [2] for deployment purposes, which
> > is
> > > a widely adopted way to launch application in corp environments.
> > >
> > > We would like to propose an improvement [3] to introduce the other
> > comment
> > > methods [1] for securing long-running application on YARN and enable
> > > impersonation mode. Any comments and suggestions are highly
> appreciated.
> > >
> > > Many thanks,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html#Securing_Long-lived_YARN_Services
> > > [2]
> > >
> > >
> >
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html
> > > [3]
> > >
> > >
> >
> https://docs.google.com/document/d/1rBLCpyQKg6Ld2P0DEgv4VIOMTwv4sitd7h7P5r202IE/edit?usp=sharing
> > >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
> >
>


Re: Apply for permisson as a contributor

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 07:51 Uhr schrieb :

> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is danny0405.


Re: Would you please give me the permission as a contributor?

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 08:58 Uhr schrieb linjie :

> Hi Guys,
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA username is linjie,full name is xulinjie.
>
>


Re: Hi

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 04:00 Uhr schrieb 黄子健 :

> Hi Guys,
>
>
> I want to contribute to Apache Flink.
>
> Would you please give me the permission as a contributor?
>
> My JIRA ID is huangzijian888.
>


Re: Hi!

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 03:57 Uhr schrieb 郭海帅 :

> Hi, I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is JJ GUO.


Re: Approve Contributor Permission

2019-02-14 Thread Fabian Hueske
Hi Maodou,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best,
Fabian

Am Do., 14. Feb. 2019 um 02:58 Uhr schrieb dou mao :

> Hi all :
>
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor ?
> My JIRA account is Maodou, and email is current.
>
>
> Hope your reply :)
>
> Maodou
>


Re: Apply JIRA Contributor

2019-02-14 Thread Fabian Hueske
Hi,

Welcome to the Flink community!
I've given you contributor permissions for Jira.

Best, Fabian

Am Do., 14. Feb. 2019 um 02:17 Uhr schrieb Ma, Yan :

>
> Hi guys,
>
> I would like to be a contributor of Apache flink, can someone give me the
> JIRA access to this project? My JIRA id is yma.
>
> Thanks very much in advance.
>
> Yan
>