About the blob client and blob server authentication

2017-02-22 Thread Zhangrucong
Hi:
I find the the flink issue 2425. https://github.com/apache/flink/pull/2425

This issue will do the authentication by using security cookie between the blob 
client and blob server!

In my opinion, to use the SASL digest-md5 is much more authority. what do you 
think?

BTW, when this issue is merged to master? Thanks in advance!


re: Does Flink cluster security works in Flink 1.1.4 release?

2017-01-08 Thread Zhangrucong
Hi Stephan:
 Thanks for your reply. I think in standalone mode, the 
Kerberos authentication should not depend on Hadoop. What about your opinion?
 By the way, when the Flink 1.2.0 come up?

   Thanks in advance!

发件人: Stephan Ewen [mailto:se...@apache.org]
发送时间: 2017年1月6日 18:04
收件人: user@flink.apache.org<mailto:user@flink.apache.org>
抄送: Zhangrucong; Robert Metzger
主题: Re: Does Flink cluster security works in Flink 1.1.4 release?

I think you can also use Kerberos in the standalone mode in 1.1.x, but is is 
more tricky - you need do a "kinit" on every host where you launch a Flink 
process.

Flink 1.2 has better Kerberos support.

On Fri, Jan 6, 2017 at 4:19 AM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:
Hi Stephan:
  Thanks for your reply.
You mean the CLI、JM、TM、WebUI  have supported Kerberos authentication only in 
yarn cluster model in 1.1.x release?

发件人: Stephan Ewen [mailto:se...@apache.org]
发送时间: 2017年1月4日 17:23
收件人: user@flink.apache.org<mailto:user@flink.apache.org>
主题: Re: Does Flink cluster security works in Flink 1.1.4 release?

Hi!

Flink 1.1.x supports Kerberos for Hadoop (HDFS, YARN, HBase) via Hadoop's 
ticket system. It should work via kinit, in the same way when submitting a 
secure MapReduce job.

Kerberos for ZooKeeper, Kafka, etc, is only part of the 1.2 release.

Greetings,
Stephan


On Wed, Jan 4, 2017 at 7:25 AM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:
Hi:
 Now I use Flink 1.1.4 release in standalone cluster model. I want to do 
the Kerberos authentication between Flink CLI and the Jobmanager. But in the 
flink-conf.yaml, there is no Flink cluster security configuration.
Does the Kerberos authentication works in Flink 1.1.4 release?
Thanks in advance!





re: Does Flink cluster security works in Flink 1.1.4 release?

2017-01-05 Thread Zhangrucong
Hi Stephan:
  Thanks for your reply.
You mean the CLI、JM、TM、WebUI  have supported Kerberos authentication only in 
yarn cluster model in 1.1.x release?

发件人: Stephan Ewen [mailto:se...@apache.org]
发送时间: 2017年1月4日 17:23
收件人: user@flink.apache.org<mailto:user@flink.apache.org>
主题: Re: Does Flink cluster security works in Flink 1.1.4 release?

Hi!

Flink 1.1.x supports Kerberos for Hadoop (HDFS, YARN, HBase) via Hadoop's 
ticket system. It should work via kinit, in the same way when submitting a 
secure MapReduce job.

Kerberos for ZooKeeper, Kafka, etc, is only part of the 1.2 release.

Greetings,
Stephan


On Wed, Jan 4, 2017 at 7:25 AM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:
Hi:
 Now I use Flink 1.1.4 release in standalone cluster model. I want to do 
the Kerberos authentication between Flink CLI and the Jobmanager. But in the 
flink-conf.yaml, there is no Flink cluster security configuration.
Does the Kerberos authentication works in Flink 1.1.4 release?
Thanks in advance!




Does Flink cluster security works in Flink 1.1.4 release?

2017-01-03 Thread Zhangrucong
Hi:
 Now I use Flink 1.1.4 release in standalone cluster model. I want to do 
the Kerberos authentication between Flink CLI and the Jobmanager. But in the 
flink-conf.yaml, there is no Flink cluster security configuration.
Does the Kerberos authentication works in Flink 1.1.4 release?
Thanks in advance!



re: About Sliding window

2016-10-12 Thread Zhangrucong
Hi Kostas:
Thanks for your answer.

So in your previous figure (yesterday) when e3 arrives, also e2 should be 
included in the result, right?
--zhangrucong: In Oct 11 email, e2 is coming at 9:02, e3 is coming at 9:07, 
and the aging time is 5 mins. So When e3 coming, e2 is aged. E2 is not in the 
result!

In the mail, you say you have discussion. Can you show me the link , I want to 
take part in it.

Best wishes!

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 22:32
收件人: Zhangrucong
抄送: user@flink.apache.org; Aljoscha Krettek
主题: Re: About Sliding window

Hello,

So in your previous figure (yesterday) when e3 arrives, also e2 should be 
included in the result, right?

In this case, I think that what you need is a Session window with gap equal to 
your event aging duration and
an evictor that evicts the elements that lag behind more than the gap duration.

The latter, the evictor that I am describing, is not currently supported in 
Flink but there is an ongoing
discussion in the dev mailing list about it. So it is worth having a look there 
and participate in the discussion.

I also loop in Aljoscha in the discussion, in case he has another solution that 
you can deploy right-away.

Thanks,
Kostas

On Oct 12, 2016, at 3:36 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
It doesn’t matter. Can you see the picture? My user case is:

1、The events are coming according to the following order

At 9:01 e1 is coming
At 9:02 e2 is coming
At 9:06  e3 is coming
At 9:08   e4 is coming

The time is system time.

2、And  event aging time is 5 minutes.

3、
   At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the 
result.
   At 9:02 e2 is coming,  aged nothing, store e2,  We count e1 and e2. and 
send the result.
  At 9:06  e3 is coming,  aged e1,  store e3, we count e2 and e3, and send 
the result.
 At 9:08   e4 is coming,  aged e2,  store e4, we count e3 and e4, and send 
the result.


I think I need a certain duration window.

Thank you very much!
发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 21:11
收件人: Zhangrucong
抄送: user@flink.apache.org<mailto:user@flink.apache.org>
主题: Re: About Sliding window

Hello again,

Sorry for the delay but I cannot really understand your use case.
Could you explain a bit more what do you mean by “out-of-date” event and 
“aging” an event?

Also your windows are of a certain duration or global?

Thanks,
Kostas

On Oct 11, 2016, at 3:04 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
Thank you for your rapid response!

My use-case is that :
For every incoming event, we want to age the out-of-date event , count the 
event in window and send the result.

For example:
The events are coming as flowing:


We want flowing result:



By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
that the function of Slide Event-time row-window suits my use-case. Does data 
stream API  support row window?

Thanks !

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月11日 19:38
收件人: user@flink.apache.org<mailto:user@flink.apache.org>
主题: Re: About Sliding window

Hi Zhangrucong,

Sliding windows only support time-based slide.
So your use-case is not supported out-of-the-box.

But, if you describe a bit more what you want to do,
we may be able to find a way together to do your job using
the currently offered functionality.

Kostas

On Oct 11, 2016, at 1:20 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!







re: About Sliding window

2016-10-12 Thread Zhangrucong
Hi Kostas:
It doesn’t matter. Can you see the picture? My user case is:

1、The events are coming according to the following order
[cid:image004.png@01D224D0.A15CB290]
At 9:01 e1 is coming
At 9:02 e2 is coming
At 9:06  e3 is coming
At 9:08   e4 is coming

The time is system time.

2、And  event aging time is 5 minutes.

3、
   At 9:01 e1 is coming, aged nothing, store e1,we count e1 and send the 
result.
   At 9:02 e2 is coming,  aged nothing, store e2,  We count e1 and e2. and 
send the result.
  At 9:06  e3 is coming,  aged e1,  store e3, we count e2 and e3, and send 
the result.
 At 9:08   e4 is coming,  aged e2,  store e4, we count e3 and e4, and send 
the result.


I think I need a certain duration window.

Thank you very much!
发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月12日 21:11
收件人: Zhangrucong
抄送: user@flink.apache.org
主题: Re: About Sliding window

Hello again,

Sorry for the delay but I cannot really understand your use case.
Could you explain a bit more what do you mean by “out-of-date” event and 
“aging” an event?

Also your windows are of a certain duration or global?

Thanks,
Kostas

On Oct 11, 2016, at 3:04 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hi Kostas:
Thank you for your rapid response!

My use-case is that :
For every incoming event, we want to age the out-of-date event , count the 
event in window and send the result.

For example:
The events are coming as flowing:


We want flowing result:



By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
that the function of Slide Event-time row-window suits my use-case. Does data 
stream API  support row window?

Thanks !

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月11日 19:38
收件人: user@flink.apache.org<mailto:user@flink.apache.org>
主题: Re: About Sliding window

Hi Zhangrucong,

Sliding windows only support time-based slide.
So your use-case is not supported out-of-the-box.

But, if you describe a bit more what you want to do,
we may be able to find a way together to do your job using
the currently offered functionality.

Kostas

On Oct 11, 2016, at 1:20 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!





oledata.mso
Description: oledata.mso


image003.emz
Description: image003.emz


re: About Sliding window

2016-10-11 Thread Zhangrucong
Hi Kostas:
Thank you for your rapid response!

My use-case is that :
For every incoming event, we want to age the out-of-date event , count the 
event in window and send the result.

For example:
The events are coming as flowing:
[cid:image002.png@01D22401.7DD230E0]

We want flowing result:
[cid:image004.png@01D22402.EF14D4A0]


By the way, In StreamSQL API, in FILP11, It will realize row window. It seems 
that the function of Slide Event-time row-window suits my use-case. Does data 
stream API  support row window?

Thanks !

发件人: Kostas Kloudas [mailto:k.klou...@data-artisans.com]
发送时间: 2016年10月11日 19:38
收件人: user@flink.apache.org
主题: Re: About Sliding window

Hi Zhangrucong,

Sliding windows only support time-based slide.
So your use-case is not supported out-of-the-box.

But, if you describe a bit more what you want to do,
we may be able to find a way together to do your job using
the currently offered functionality.

Kostas

On Oct 11, 2016, at 1:20 PM, Zhangrucong 
<zhangruc...@huawei.com<mailto:zhangruc...@huawei.com>> wrote:

Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!



image001.emz
Description: image001.emz


oledata.mso
Description: oledata.mso


image003.emz
Description: image003.emz


About Sliding window

2016-10-11 Thread Zhangrucong
Hello everyone:
  Now, I am want to use DataStream sliding window API. I look at the API and I 
have a question, dose the sliding time window support sliding by every incoming 
event?

Thanks in advance!



About flink stream table API

2016-04-27 Thread Zhangrucong
Hello everybody:
 I want to learn the flink stream API. The stream sql is the same with 
calcite?
 In the flowing link, the examples of table api are dataset, where I can 
see the detail introduction of streaming table API.
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/table.html

  somebody who can help me.
 Thanks in advance!




About flink stream table API

2016-04-26 Thread Zhangrucong
Hello:
 I want to learn the flink stream API. The stream sql is the same with 
calcite?
 In the flowing link, the examples of table api are dataset, where I can 
see the detail introduction of streaming table API.
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/table.html

 Thanks in advance!




re: Flink HA mode

2015-09-08 Thread Zhangrucong
In order to discover new JM,I think must use ZK. ZK has the ability to find a 
new node or the content of node changed.
First JM must create node in ZK, and write IP and port in node. TMs watch this 
node. When TMs find the node content change, TMs reconnect the new JM.

Thanks.

发件人: Emmanuel [mailto:ele...@msn.com]
发送时间: 2015年9月9日 7:59
收件人: user@flink.apache.org
主题: Flink HA mode

Looking at Flink HA mode.

Why do you need to have the list of masters in the config if zookeeper is used 
to keep track of them?
In an environment like Google Cloud or Container Engine, the JM may come back 
up but will likely have another IP address.

Is the masters config file only for bootstrapping or is it effectively to keep 
track of JM nodes?
Is it what TM nodes use to know their JobManager? Or do they query Zookeeper?

In the scenario above (2 JMs, one fails, and comes back up as another instance, 
with different IP) how does the JM join the cluster and how do TMs know about 
the new JM?

Thanks
E



Question about exactly-once

2015-09-07 Thread Zhangrucong
Dear Sir:
I am a beginner of Flink and very interested in “Exactly-once” Recovery 
Mechanism. I have a question about processing sequence problem of tuples. For 
example, in Fig 1, process unit A runs JOIN, and the size of sliding window is 
4. At the beginning, the state of sliding windows is shown in Fig 2. Before A 
failed, Tuples came in the order of 1,2,3,4,and the join results are 
(1,1),(2,2)(2,2)(3,3)(4,4),but after A failed and the state is reset to 
Snap(x), tuples came in the order of 3,4,1,2. This time the join results are 
(3,3)(3,3)(4,4)(2,2)(2,2).
[cid:image001.png@01D0EA37.0E67AA00]
Fig 1
[cid:image002.png@01D0EA37.0E67AA00]
Fig 2
I wonder how Flink’s mechanism guarantees the consistency of results or 
consistency of tuples’ sequence?


Thank you very much.


About exactly once question?

2015-08-27 Thread Zhangrucong
Hi:
  The document said Flink can guarantee processing each tuple exactly-once, 
but I can not understand how it works.
   For example, In Fig 1, C is running between snapshot n-1 and snapshot 
n(snapshot n hasn't been generated). After snapshot n-1, C has processed tuple 
x1, x2, x3 and already outputted to user,  then C failed and it recoveries from 
snapshot n-1. In my opinion, x1, x2, x3 will be processed and outputted to user 
again. My question is how Flink guarantee x1,x2,x3 are processed and outputted 
to user only once?


[cid:image001.png@01D0E0F6.B3DCC0F0]
Fig 1.
Thanks for answing.


How to understand slot?

2015-08-18 Thread Zhangrucong
When I read the schedule code in job manager. I have flowing questions:


1、  How to decide a job vertex to deploy in a shared slot?  What is the benefit 
deploy vertexes in a shared slot?

2、  How to decide a task manager has how many slots?

3、  If there are many task managers, when allocate a new slot, how to decide to 
use which slot in which task manger?

4、  If there have detail documents about schedule?



Thank you for any suggestions in advance!






答复: How to understand slot?

2015-08-18 Thread Zhangrucong
Hi stephan, Thanks a lot for answering.

3) For sources, Flink picks a random TaskManager (splits are then assigned 
locality aware to the sources). For all tasks after sources, Flink tries to 
co-locate them with their input(s), unless they have so many inputs that 
co-location makes no difference (each parallel reducer task has all mapper 
tasks as inputs).

If for sources, Flink picks a random taskmanager. In distributed scene, Some 
taskmangers run full task, some taskmangers run litter task, It is not balance?

Thanks!


发件人: ewenstep...@gmail.com [mailto:ewenstep...@gmail.com] 代表 Stephan Ewen
发送时间: 2015年8月18日 16:23
收件人: user@flink.apache.org
主题: Re: How to understand slot?

Hi!

There is a little bit of documentation about the scheduling and the slots

  - In the config reference: 
https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-taskmanager-processing-slots

  - In the internals docs: 
https://ci.apache.org/projects/flink/flink-docs-master/internals/job_scheduling.html


For our other questions, here are some brief answers:

1) Shared slots are very useful for pipelined execution. Shared slots mean that 
(in the example of MapReduce), one slot can hold one mapper and one reducer. 
Mappers and reducers run at the same time when using pipelined execution.

2) A good choice for the number of slots is the number of CPU cores in the 
processor.

3) For sources, Flink picks a random TaskManager (splits are then assigned 
locality aware to the sources). For all tasks after sources, Flink tries to 
co-locate them with their input(s), unless they have so many inputs that 
co-location makes no difference (each parallel reducer task has all mapper 
tasks as inputs).


Greetings,
Stephan



On Tue, Aug 18, 2015 at 9:29 AM, Zhangrucong 
zhangruc...@huawei.commailto:zhangruc...@huawei.com wrote:
When I read the schedule code in job manager. I have flowing questions:


1、  How to decide a job vertex to deploy in a shared slot?  What is the benefit 
deploy vertexes in a shared slot?

2、  How to decide a task manager has how many slots?

3、  If there are many task managers, when allocate a new slot, how to decide to 
use which slot in which task manger?

4、  If there have detail documents about schedule?



Thank you for any suggestions in advance!