About the blob client and blob server authentication
Hi: I find the the flink issue 2425. https://github.com/apache/flink/pull/2425 This issue will do the authentication by using security cookie between the blob client and blob server! In my opinion, to use the SASL digest-md5 is much more authority. what do you think? BTW, when this issue is merged to master? Thanks in advance!
re: Does Flink cluster security works in Flink 1.1.4 release?
Hi Stephan: Thanks for your reply. I think in standalone mode, the Kerberos authentication should not depend on Hadoop. What about your opinion? By the way, when the Flink 1.2.0 come up? Thanks in advance! 发件人: Stephan Ewen [mailto:se...@apache.org] 发送时间: 2017年1月6日 18:04 收件人: user@flink.apache.org<mailto:user@flink.apache.org> 抄送: Zhangrucong; Robert Metzger 主题: Re: Does Flink cluster security works in Flink 1.1.4 release? I think you can also use Kerberos in the standalone mode in 1.1.x, but is is more tricky - you need do a "kinit" on every host where you launch a Flink process. Flink 1.2 has better Kerberos support. On Fri, Jan 6, 2017 at 4:19 AM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi Stephan: Thanks for your reply. You mean the CLI、JM、TM、WebUI have supported Kerberos authentication only in yarn cluster model in 1.1.x release? 发件人: Stephan Ewen [mailto:se...@apache.org] 发送时间: 2017年1月4日 17:23 收件人: user@flink.apache.org<mailto:user@flink.apache.org> 主题: Re: Does Flink cluster security works in Flink 1.1.4 release? Hi! Flink 1.1.x supports Kerberos for Hadoop (HDFS, YARN, HBase) via Hadoop's ticket system. It should work via kinit, in the same way when submitting a secure MapReduce job. Kerberos for ZooKeeper, Kafka, etc, is only part of the 1.2 release. Greetings, Stephan On Wed, Jan 4, 2017 at 7:25 AM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi: Now I use Flink 1.1.4 release in standalone cluster model. I want to do the Kerberos authentication between Flink CLI and the Jobmanager. But in the flink-conf.yaml, there is no Flink cluster security configuration. Does the Kerberos authentication works in Flink 1.1.4 release? Thanks in advance!
re: Does Flink cluster security works in Flink 1.1.4 release?
Hi Stephan: Thanks for your reply. You mean the CLI、JM、TM、WebUI have supported Kerberos authentication only in yarn cluster model in 1.1.x release? 发件人: Stephan Ewen [mailto:se...@apache.org] 发送时间: 2017年1月4日 17:23 收件人: user@flink.apache.org<mailto:user@flink.apache.org> 主题: Re: Does Flink cluster security works in Flink 1.1.4 release? Hi! Flink 1.1.x supports Kerberos for Hadoop (HDFS, YARN, HBase) via Hadoop's ticket system. It should work via kinit, in the same way when submitting a secure MapReduce job. Kerberos for ZooKeeper, Kafka, etc, is only part of the 1.2 release. Greetings, Stephan On Wed, Jan 4, 2017 at 7:25 AM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi: Now I use Flink 1.1.4 release in standalone cluster model. I want to do the Kerberos authentication between Flink CLI and the Jobmanager. But in the flink-conf.yaml, there is no Flink cluster security configuration. Does the Kerberos authentication works in Flink 1.1.4 release? Thanks in advance!
Does Flink cluster security works in Flink 1.1.4 release?
Hi: Now I use Flink 1.1.4 release in standalone cluster model. I want to do the Kerberos authentication between Flink CLI and the Jobmanager. But in the flink-conf.yaml, there is no Flink cluster security configuration. Does the Kerberos authentication works in Flink 1.1.4 release? Thanks in advance!
re: About Sliding window
Hi Kostas: Thanks for your answer. So in your previous figure (yesterday) when e3 arrives, also e2 should be included in the result, right? --zhangrucong: In Oct 11 email, e2 is coming at 9:02, e3 is coming at 9:07, and the aging time is 5 mins. So When e3 coming, e2 is aged. E2 is not in the result! In the mail, you say you have discussion. Can you show me the link , I want to take part in it. Best wishes! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月12日 22:32 收件人: Zhangrucong 抄送: user@flink.apache.org; Aljoscha Krettek 主题: Re: About Sliding window Hello, So in your previous figure (yesterday) when e3 arrives, also e2 should be included in the result, right? In this case, I think that what you need is a Session window with gap equal to your event aging duration and an evictor that evicts the elements that lag behind more than the gap duration. The latter, the evictor that I am describing, is not currently supported in Flink but there is an ongoing discussion in the dev mailing list about it. So it is worth having a look there and participate in the discussion. I also loop in Aljoscha in the discussion, in case he has another solution that you can deploy right-away. Thanks, Kostas On Oct 12, 2016, at 3:36 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi Kostas: It doesn’t matter. Can you see the picture? My user case is: 1、The events are coming according to the following order At 9:01 e1 is coming At 9:02 e2 is coming At 9:06 e3 is coming At 9:08 e4 is coming The time is system time. 2、And event aging time is 5 minutes. 3、 At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the result. At 9:02 e2 is coming, aged nothing, store e2, We count e1 and e2. and send the result. At 9:06 e3 is coming, aged e1, store e3, we count e2 and e3, and send the result. At 9:08 e4 is coming, aged e2, store e4, we count e3 and e4, and send the result. I think I need a certain duration window. Thank you very much! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月12日 21:11 收件人: Zhangrucong 抄送: user@flink.apache.org<mailto:user@flink.apache.org> 主题: Re: About Sliding window Hello again, Sorry for the delay but I cannot really understand your use case. Could you explain a bit more what do you mean by “out-of-date” event and “aging” an event? Also your windows are of a certain duration or global? Thanks, Kostas On Oct 11, 2016, at 3:04 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi Kostas: Thank you for your rapid response! My use-case is that : For every incoming event, we want to age the out-of-date event , count the event in window and send the result. For example: The events are coming as flowing: We want flowing result: By the way, In StreamSQL API, in FILP11, It will realize row window. It seems that the function of Slide Event-time row-window suits my use-case. Does data stream API support row window? Thanks ! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月11日 19:38 收件人: user@flink.apache.org<mailto:user@flink.apache.org> 主题: Re: About Sliding window Hi Zhangrucong, Sliding windows only support time-based slide. So your use-case is not supported out-of-the-box. But, if you describe a bit more what you want to do, we may be able to find a way together to do your job using the currently offered functionality. Kostas On Oct 11, 2016, at 1:20 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hello everyone: Now, I am want to use DataStream sliding window API. I look at the API and I have a question, dose the sliding time window support sliding by every incoming event? Thanks in advance!
re: About Sliding window
Hi Kostas: It doesn’t matter. Can you see the picture? My user case is: 1、The events are coming according to the following order [cid:image004.png@01D224D0.A15CB290] At 9:01 e1 is coming At 9:02 e2 is coming At 9:06 e3 is coming At 9:08 e4 is coming The time is system time. 2、And event aging time is 5 minutes. 3、 At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the result. At 9:02 e2 is coming, aged nothing, store e2, We count e1 and e2. and send the result. At 9:06 e3 is coming, aged e1, store e3, we count e2 and e3, and send the result. At 9:08 e4 is coming, aged e2, store e4, we count e3 and e4, and send the result. I think I need a certain duration window. Thank you very much! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月12日 21:11 收件人: Zhangrucong 抄送: user@flink.apache.org 主题: Re: About Sliding window Hello again, Sorry for the delay but I cannot really understand your use case. Could you explain a bit more what do you mean by “out-of-date” event and “aging” an event? Also your windows are of a certain duration or global? Thanks, Kostas On Oct 11, 2016, at 3:04 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hi Kostas: Thank you for your rapid response! My use-case is that : For every incoming event, we want to age the out-of-date event , count the event in window and send the result. For example: The events are coming as flowing: We want flowing result: By the way, In StreamSQL API, in FILP11, It will realize row window. It seems that the function of Slide Event-time row-window suits my use-case. Does data stream API support row window? Thanks ! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月11日 19:38 收件人: user@flink.apache.org<mailto:user@flink.apache.org> 主题: Re: About Sliding window Hi Zhangrucong, Sliding windows only support time-based slide. So your use-case is not supported out-of-the-box. But, if you describe a bit more what you want to do, we may be able to find a way together to do your job using the currently offered functionality. Kostas On Oct 11, 2016, at 1:20 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hello everyone: Now, I am want to use DataStream sliding window API. I look at the API and I have a question, dose the sliding time window support sliding by every incoming event? Thanks in advance! oledata.mso Description: oledata.mso image003.emz Description: image003.emz
re: About Sliding window
Hi Kostas: Thank you for your rapid response! My use-case is that : For every incoming event, we want to age the out-of-date event , count the event in window and send the result. For example: The events are coming as flowing: [cid:image002.png@01D22401.7DD230E0] We want flowing result: [cid:image004.png@01D22402.EF14D4A0] By the way, In StreamSQL API, in FILP11, It will realize row window. It seems that the function of Slide Event-time row-window suits my use-case. Does data stream API support row window? Thanks ! 发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com] 发送时间: 2016年10月11日 19:38 收件人: user@flink.apache.org 主题: Re: About Sliding window Hi Zhangrucong, Sliding windows only support time-based slide. So your use-case is not supported out-of-the-box. But, if you describe a bit more what you want to do, we may be able to find a way together to do your job using the currently offered functionality. Kostas On Oct 11, 2016, at 1:20 PM, Zhangrucong <zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote: Hello everyone: Now, I am want to use DataStream sliding window API. I look at the API and I have a question, dose the sliding time window support sliding by every incoming event? Thanks in advance! image001.emz Description: image001.emz oledata.mso Description: oledata.mso image003.emz Description: image003.emz
About Sliding window
Hello everyone: Now, I am want to use DataStream sliding window API. I look at the API and I have a question, dose the sliding time window support sliding by every incoming event? Thanks in advance!
About flink stream table API
Hello everybody: I want to learn the flink stream API. The stream sql is the same with calcite? In the flowing link, the examples of table api are dataset, where I can see the detail introduction of streaming table API. https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/table.html somebody who can help me. Thanks in advance!
About flink stream table API
Hello: I want to learn the flink stream API. The stream sql is the same with calcite? In the flowing link, the examples of table api are dataset, where I can see the detail introduction of streaming table API. https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/table.html Thanks in advance!
re: Flink HA mode
In order to discover new JM,I think must use ZK. ZK has the ability to find a new node or the content of node changed. First JM must create node in ZK, and write IP and port in node. TMs watch this node. When TMs find the node content change, TMs reconnect the new JM. Thanks. 发件人: Emmanuel [mailto:ele...@msn.com] 发送时间: 2015年9月9日 7:59 收件人: user@flink.apache.org 主题: Flink HA mode Looking at Flink HA mode. Why do you need to have the list of masters in the config if zookeeper is used to keep track of them? In an environment like Google Cloud or Container Engine, the JM may come back up but will likely have another IP address. Is the masters config file only for bootstrapping or is it effectively to keep track of JM nodes? Is it what TM nodes use to know their JobManager? Or do they query Zookeeper? In the scenario above (2 JMs, one fails, and comes back up as another instance, with different IP) how does the JM join the cluster and how do TMs know about the new JM? Thanks E
Question about exactly-once
Dear Sir: I am a beginner of Flink and very interested in “Exactly-once” Recovery Mechanism. I have a question about processing sequence problem of tuples. For example, in Fig 1, process unit A runs JOIN, and the size of sliding window is 4. At the beginning, the state of sliding windows is shown in Fig 2. Before A failed, Tuples came in the order of 1,2,3,4,and the join results are (1,1),(2,2)(2,2)(3,3)(4,4),but after A failed and the state is reset to Snap(x), tuples came in the order of 3,4,1,2. This time the join results are (3,3)(3,3)(4,4)(2,2)(2,2). [cid:image001.png@01D0EA37.0E67AA00] Fig 1 [cid:image002.png@01D0EA37.0E67AA00] Fig 2 I wonder how Flink’s mechanism guarantees the consistency of results or consistency of tuples’ sequence? Thank you very much.
About exactly once question?
Hi: The document said Flink can guarantee processing each tuple exactly-once, but I can not understand how it works. For example, In Fig 1, C is running between snapshot n-1 and snapshot n(snapshot n hasn't been generated). After snapshot n-1, C has processed tuple x1, x2, x3 and already outputted to user, then C failed and it recoveries from snapshot n-1. In my opinion, x1, x2, x3 will be processed and outputted to user again. My question is how Flink guarantee x1,x2,x3 are processed and outputted to user only once? [cid:image001.png@01D0E0F6.B3DCC0F0] Fig 1. Thanks for answing.
How to understand slot?
When I read the schedule code in job manager. I have flowing questions: 1、 How to decide a job vertex to deploy in a shared slot? What is the benefit deploy vertexes in a shared slot? 2、 How to decide a task manager has how many slots? 3、 If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger? 4、 If there have detail documents about schedule? Thank you for any suggestions in advance!
答复: How to understand slot?
Hi stephan, Thanks a lot for answering. 3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs). If for sources, Flink picks a random taskmanager. In distributed scene, Some taskmangers run full task, some taskmangers run litter task, It is not balance? Thanks! 发件人: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] 代表 Stephan Ewen 发送时间: 2015年8月18日 16:23 收件人: user@flink.apache.org 主题: Re: How to understand slot? Hi! There is a little bit of documentation about the scheduling and the slots - In the config reference: https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-taskmanager-processing-slots - In the internals docs: https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html For our other questions, here are some brief answers: 1) Shared slots are very useful for pipelined execution. Shared slots mean that (in the example of MapReduce), one slot can hold one mapper and one reducer. Mappers and reducers run at the same time when using pipelined execution. 2) A good choice for the number of slots is the number of CPU cores in the processor. 3) For sources, Flink picks a random TaskManager (splits are then assigned locality aware to the sources). For all tasks after sources, Flink tries to co-locate them with their input(s), unless they have so many inputs that co-location makes no difference (each parallel reducer task has all mapper tasks as inputs). Greetings, Stephan On Tue, Aug 18, 2015 at 9:29 AM, Zhangrucong zhangruc...@huawei.commailto:zhangruc...@huawei.com wrote: When I read the schedule code in job manager. I have flowing questions: 1、 How to decide a job vertex to deploy in a shared slot? What is the benefit deploy vertexes in a shared slot? 2、 How to decide a task manager has how many slots? 3、 If there are many task managers, when allocate a new slot, how to decide to use which slot in which task manger? 4、 If there have detail documents about schedule? Thank you for any suggestions in advance!