date:20170428

[jira] [Created] (FLINK-6420) Cleaner CEP API to specify conditions between events

2017-04-28 Thread Elias Levy (JIRA)

Elias Levy created FLINK-6420:
-

 Summary: Cleaner CEP API to specify conditions between events
 Key: FLINK-6420
 URL: https://issues.apache.org/jira/browse/FLINK-6420
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Affects Versions: 1.3.0
Reporter: Elias Levy
Priority: Minor


Flink 1.3 will introduce so-called iterative conditions, which allow the 
predicate to look up events already matched by conditions in the pattern.  This 
permits specifying conditions between matched events, similar to a conditional 
join between tables in SQL.  Alas, the API could be simplified to specify such 
conditions more declaratively.

At the moment you have to do something like
{code}
Pattern.
  .begin[Foo]("first")
.where( first => first.baz == 1 )
  .followedBy("next")
.where({ (next, ctx) =>
  val first = ctx.getEventsForPattern("first").next
  first.bar == next.bar && next => next.boo = "x"
})
{code}
which is not very clean.  It would friendlier if you could do something like:
{code}
Pattern.
  .begin[Foo]("first")
.where( first => first.baz == 1 )
  .followedBy("next")
.relatedTo("first", { (first, next) => first.bar == next.bar })
.where( next => next.boo = "x" )
{code}
Something along these lines would work well when the condition being tested 
against matches a single event (single quantifier).  

If the condition being tested can accept multiple events (e.g. times 
quantifier) two other methods could be used {{relatedToAny}} and 
{{relatedToAll}}, each of which takes a predicate function.  In both cases each 
previously accepted element of the requested condition is evaluated against the 
predicate.  In the former case if any evaluation returns true the condition is 
satisfied.  In the later case all evaluations must return true for the 
condition to be satisfied.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6419) Better support for CEP quantified conditions in PatternSelect.select

2017-04-28 Thread Elias Levy (JIRA)

Elias Levy created FLINK-6419:
-

 Summary: Better support for CEP quantified conditions in 
PatternSelect.select
 Key: FLINK-6419
 URL: https://issues.apache.org/jira/browse/FLINK-6419
 Project: Flink
  Issue Type: Improvement
  Components: CEP
Affects Versions: 1.3.0
Reporter: Elias Levy
Priority: Minor


Flink 1.3 introduces to the API quantifer methods which allow one to 
declaratively specific how many times a condition must be matched before there 
is a state change.

The pre-existing {{PatternSelect.select}} method does not account for this 
change very well.  The selection function passed to {{select}} receives a 
{{Map[String,T]}} as an argument that permits the function to look up the 
matched events by the condition's name.  

To support the new functionality that permits a condition to match multiple 
elements, when a quantifier is greater than one, the matched events are stored 
in the map by appending the condition's name with an underscore and an index 
value.

While functional, this is less than ideal.  It would be best if conditions with 
quantifier that is a multiple returned the matched events in an array and if 
they were accessible via the condition's name, without have to construct keys 
from the condition's name and an index, and iterate querying the map until no 
more are found. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Idempotent Job Submission

2017-04-28 Thread Till Rohrmann

Hi James,

I think if you keep the JobGraph around and don’t generate it by
re-executing the user code, then you will have the same JobID when
re-submitting this JobGraph. This should allow you to do idempotent job
submissions.

However, if you use Flink’s high availability mode, then you should not
have to re-submit a job, because the JobManager will automatically retrieve
all submitted jobs (given a successful job submission before).

Concerning the Flip-6 work, you’re right. It is an ongoing effort and we
try to replace more and more parts with the new code base. The way you
describe the interaction with the JobMasterRunner and the JobMaster is
right.

I think in the future the Dispatcher component will be responsible for
storing a submitted job persistently such that it can be retrieved in case
of a failure. It will also be the responsibility of the Dispatcher to
monitor the running JobMasters and to restart them in case of a failure.
Probably the job upgrade path will also go through the Dispatcher or the
Client, because they know how to stop a job and how to spawn a new JobMaster
with the new job. The JobMaster should only be responsible for running a
single version of a job.

How the upgrade story works with Kubernetes, I cannot tell in detail at the
moment. If it is not possible to talk to Kubernetes from within a
containerized application in order to start new containers, then I assume
that an external layer (e.g. the client) has to do the job upgrade
operation. Otherwise it could work like in the Yarn and Mesos scenario.

At the moment we have the following bigger threads wrt Flip-6 going on:

   - Client side integration with Flip-6 which will most likely mean to
   implement a REST based client
   - Dispatcher component for multi-tenancy
   - Proper fencing for split brain situations
   - Port Mesos to Flip-6 (being worked on)
   - Make Yarn implementation elastic (allowing to allocate and de-allocate
   containers)
   - Testing

We are happy about every helping hand because Flip-6 is a lot of work. Once
the feature freeze for Flink 1.3 is reached (probably next week) I will
spend some time refining these bigger steps and creating JIRA issues for
them.

Cheers,
Till

On Thu, Apr 20, 2017 at 6:20 PM, James Bucher  wrote:

> Hey all,
>
> I have been doing some digging to see if there is a good way to do an
> idempotent job submission. I was hoping to write a job submission agent
> that does the following:
>
>   1.  Checks to see if the cluster is running yet (can contact a
> JobManager)
>   2.  Checks to see if the job it is watching is running.
>   3.  Submits the job if it is not yet running.
>   4.  Retry if there are any issues.
>
> Specifically at the moment there doesn’t seem to be any functionality for
> submitting a job if it doesn’t exist. The current interface creates a
> situation where a race condition is possible (as far as I can tell):
>
> For example if the following sequence of events occurs:
>
>   1.  JobManager fails and a new Leader is re-elected:
>  *   JobManager Asynchronously starts restoring jobs: here<
> https://github.com/apache/flink/blob/release-1.2/
> flink-runtime/src/main/scala/org/apache/flink/runtime/
> jobmanager/JobManager.scala#L300>
>   2.  Client Calls to list currently running jobs (before jobs are
> restored) and gets back an incomplete list of running jobs<
> https://github.com/apache/flink/blob/release-1.2/
> flink-runtime/src/main/scala/org/apache/flink/runtime/
> jobmanager/JobManager.scala#L1009> because SubmitJob registers jobs in
> currentJobs 1.2/flink-runtime/src/main/scala/org/apache/flink/
> runtime/jobmanager/JobManager.scala#L1300>
>   3.  Client Assumes Job is no longer running so uses HTTP/CLI/Whatever to
> restore job.
>   4.  Current interfaces don’t pass in the same JobID (a new one is
> generated for each submit) so a new Job is submitted with a new JobID
>   5.  JobManager restores previous instance of the running Job
>   6.  Now there are 2 instances of the job running in the cluster.
>
> While the above state is pretty unlikely to hit when one is submitting
> jobs manually, it seems to me that an agent like the above might end up
> hitting it if the cluster was having trouble with JobManagers failing.
>
> I can see that FLIP-6
> is rewriting the whole JobManager itself. From my reading of the current
> code base this work is 1/2 way done in master.
>
> From my reading of the code/docs it seems that from the submission side
> the expectation for Docker/Kubernetes is that you will create two sets of
> containers:
>
>   1.  A JobMaster/ResourceManager container that contains the user’s job
> in some form (jar or as a serialized JobGraph).
>   2.  A TaskManager container which is either generic or potentially has
> user libs (up to the implementer/cluster maintainer)
>
> As I currently understand the code the

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Chesnay Schepler


FLINK-5892 has been merged.

For FLINK-4545 (replacing numNetworkBuffer parameter) a PR is also still 
open and could use a second pair of eyes.


On 28.04.2017 17:03, Kurt Young wrote:

Hi Flavio,

I have also fix the issue in 1.2 branch, but the next release will be 1.2.2

Best,
Kurt

On Fri, Apr 28, 2017 at 11:01 PM, Ted Yu  wrote:


Flavio:
Have you seen this (w.r.t. 1.2.1) ?

http://search-hadoop.com/m/Flink/VkLeQejxLg24Lk0D1?subj=+
RESULT+VOTE+Release+Apache+Flink+1+2+1+RC2+

On Fri, Apr 28, 2017 at 5:07 AM, Flavio Pompermaier 
wrote:


Any chance to cherry-pick this also into 1.2.1? We're usign Flink 1.2.0

in

production and maybe an upgrade to 1.2.1 would be a safer option in the
short term..

Best,
Flavio

On Fri, Apr 28, 2017 at 2:00 PM, Aljoscha Krettek 
wrote:


Ah, I see. The fix for that has been merged into master so it will be
release in Flink 1.3.


On 28. Apr 2017, at 13:50, Flavio Pompermaier 

wrote:

Sorry, you're right Aljosha..the issue number is correct, the link is
wrong! The correct one is https://issues.apache.org/

jira/browse/FLINK-6398

On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek <

aljos...@apache.org>

wrote:


I think there might be a typo. We haven’t yet reached issue number

6389,

if I’m not mistaken. The latest as I’m writing this is 6410.


On 28. Apr 2017, at 10:00, Flavio Pompermaier <

pomperma...@okkam.it>

wrote:

If it's not a problem it will be great for us to include also

FLINK-6398

 if it's not a

big

deal

Best,
Flavio

On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <

zhuoluo@alibaba-inc.com>

wrote:


Hi Devs,

Thanks for the release plan.

Could you also please add the feature FLINK-6196
 Support

dynamic

schema

in Table Function?
I’d like to update the code as comments left on PR today.
I will try to make sure the code is updated before the Apr 30th.


Thanks,

Zhuoluo 





在 2017年4月28日，上午8:48，Haohui Mai  写道：

Hello,

Thanks for starting this thread. It would be great to see the

following

features available in Flink 1.3:

* Support for complex schema: FLINK-6033, FLINK-6377
* Various improvements on SQL over group windows: FLINK-6335,

FLINK-6373

* StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
* Decoupling Flink and Hadoop: FLINK-5998

All of them have gone through at least one round of review so I'm
optimistic that they can make it to 1.3 in a day or two.

Additionally it would be great to see FLINK-6232 go in, but it

depends

on

FLINK-5884 so it might be a little bit tough.

Regards,
Haohui

On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler <

ches...@apache.org

wrote:

Hello,

FLINK-5892 (Restoring state by operator) is also nearing

completion,

but

with only 1 day left before the weekend we're cutting it really

short.

Since this eliminates a major pain point when updating jobs, as it
allows the modification of chains, another day or 2 would be good

i

think.

Regards,
Chesnay

On 27.04.2017 18:55, Bowen Li wrote:

Hi Ufuk,
I'd like to get FLINK-6013 (Adding Datadog Http metrics

reporter)

into

release 1.3. It's in the final state of code review in
https://github.com/apache/flink/pull/3736

Thanks,
Bowen

On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
wangzhijiang...@aliyun.com> wrote:

Hi Ufuk,
Thank you for launching this topic!
I wish my latest refinement of buffer provider (

https://issues.apache.org/

jira/browse/FLINK-6337)  to be included in 1.3 and most of the

jobs

can

get benefit from it. And I think it can be completed with the help

of

your

reviews this week.

Cheers,Zhijiang-
-发件人：Ufuk

Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
dev@flink.apache.org>抄 送：Robert Metzger 主
题：[DISCUSS] Feature Freeze
Hey devs! :-)

We decided to follow a time-based release model with the upcoming

1.3

release and the planned feature freeze is on Monday, May 1st.

I wanted to start a discussion to get a quick overview of the

current

state of things.

- Is everyone on track and aware of the feature freeze? ;)
- Are there any major features we want in 1.3 that
have not been merged yet?
- Do we need to extend the feature freeze, because of an
important feature?

Would be great to gather a list of features/PRs that we want in

the

1.3 release. This could be a good starting point for the release
manager (@Robert?).

Best,

Ufuk

[jira] [Created] (FLINK-6417) Wildcard support for read text file

2017-04-28 Thread Artiom Darie (JIRA)

Artiom Darie created FLINK-6417:
---

 Summary: Wildcard support for read text file
 Key: FLINK-6417
 URL: https://issues.apache.org/jira/browse/FLINK-6417
 Project: Flink
  Issue Type: New Feature
  Components: Core
Reporter: Artiom Darie
Priority: Minor


Add wildcard support while reading from s3://, hdfs://, file://, etc.

h6. Examples:
# {code} s3://bucket-name/*.gz {code}
# {code} hdfs://path/*file-name*.csv {code}
# {code} file://tmp/**/*.* {code}

h6. Proposal
# Use the existing method: {code}environment.readFile(...){code}
# List all the files in the directories
# Read files using existing: {code}ContinuousFileReaderOperator{code}

h6. Concerns (Open for discussions)
# Have multiple DataSource(s) created for each each file and then to join them 
into a single DataSource
# Have all the files into the same DataSource






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Kurt Young

Hi Flavio,

I have also fix the issue in 1.2 branch, but the next release will be 1.2.2

Best,
Kurt

On Fri, Apr 28, 2017 at 11:01 PM, Ted Yu  wrote:

> Flavio:
> Have you seen this (w.r.t. 1.2.1) ?
>
> http://search-hadoop.com/m/Flink/VkLeQejxLg24Lk0D1?subj=+
> RESULT+VOTE+Release+Apache+Flink+1+2+1+RC2+
>
> On Fri, Apr 28, 2017 at 5:07 AM, Flavio Pompermaier 
> wrote:
>
> > Any chance to cherry-pick this also into 1.2.1? We're usign Flink 1.2.0
> in
> > production and maybe an upgrade to 1.2.1 would be a safer option in the
> > short term..
> >
> > Best,
> > Flavio
> >
> > On Fri, Apr 28, 2017 at 2:00 PM, Aljoscha Krettek 
> > wrote:
> >
> > > Ah, I see. The fix for that has been merged into master so it will be
> > > release in Flink 1.3.
> > >
> > > > On 28. Apr 2017, at 13:50, Flavio Pompermaier 
> > > wrote:
> > > >
> > > > Sorry, you're right Aljosha..the issue number is correct, the link is
> > > > wrong! The correct one is https://issues.apache.org/
> > > jira/browse/FLINK-6398
> > > >
> > > > On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek <
> > aljos...@apache.org>
> > > > wrote:
> > > >
> > > >> I think there might be a typo. We haven’t yet reached issue number
> > 6389,
> > > >> if I’m not mistaken. The latest as I’m writing this is 6410.
> > > >>
> > > >>> On 28. Apr 2017, at 10:00, Flavio Pompermaier <
> pomperma...@okkam.it>
> > > >> wrote:
> > > >>>
> > > >>> If it's not a problem it will be great for us to include also
> > > FLINK-6398
> > > >>>  if it's not a
> big
> > > >> deal
> > > >>>
> > > >>> Best,
> > > >>> Flavio
> > > >>>
> > > >>> On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <
> > > >> zhuoluo@alibaba-inc.com>
> > > >>> wrote:
> > > >>>
> > >  Hi Devs,
> > > 
> > >  Thanks for the release plan.
> > > 
> > >  Could you also please add the feature FLINK-6196
> > >   Support
> dynamic
> > > >> schema
> > >  in Table Function?
> > >  I’d like to update the code as comments left on PR today.
> > >  I will try to make sure the code is updated before the Apr 30th.
> > > 
> > > 
> > >  Thanks,
> > > 
> > >  Zhuoluo 
> > > 
> > > 
> > > 
> > > 
> > > 
> > >  在 2017年4月28日，上午8:48，Haohui Mai  写道：
> > > 
> > >  Hello,
> > > 
> > >  Thanks for starting this thread. It would be great to see the
> > > following
> > >  features available in Flink 1.3:
> > > 
> > >  * Support for complex schema: FLINK-6033, FLINK-6377
> > >  * Various improvements on SQL over group windows: FLINK-6335,
> > > FLINK-6373
> > >  * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
> > >  * Decoupling Flink and Hadoop: FLINK-5998
> > > 
> > >  All of them have gone through at least one round of review so I'm
> > >  optimistic that they can make it to 1.3 in a day or two.
> > > 
> > >  Additionally it would be great to see FLINK-6232 go in, but it
> > depends
> > > >> on
> > >  FLINK-5884 so it might be a little bit tough.
> > > 
> > >  Regards,
> > >  Haohui
> > > 
> > >  On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler <
> > ches...@apache.org
> > > >
> > >  wrote:
> > > 
> > >  Hello,
> > > 
> > >  FLINK-5892 (Restoring state by operator) is also nearing
> completion,
> > > but
> > >  with only 1 day left before the weekend we're cutting it really
> > short.
> > > 
> > >  Since this eliminates a major pain point when updating jobs, as it
> > >  allows the modification of chains, another day or 2 would be good
> i
> > > >> think.
> > > 
> > >  Regards,
> > >  Chesnay
> > > 
> > >  On 27.04.2017 18:55, Bowen Li wrote:
> > > 
> > >  Hi Ufuk,
> > > I'd like to get FLINK-6013 (Adding Datadog Http metrics
> reporter)
> > > 
> > >  into
> > > 
> > >  release 1.3. It's in the final state of code review in
> > >  https://github.com/apache/flink/pull/3736
> > > 
> > >  Thanks,
> > >  Bowen
> > > 
> > >  On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
> > >  wangzhijiang...@aliyun.com> wrote:
> > > 
> > >  Hi Ufuk,
> > >  Thank you for launching this topic!
> > >  I wish my latest refinement of buffer provider (
> > > 
> > >  https://issues.apache.org/
> > > 
> > >  jira/browse/FLINK-6337)  to be included in 1.3 and most of the
> jobs
> > > can
> > >  get benefit from it. And I think it can be completed with the help
> > of
> > > 
> > >  your
> > > 
> > >  reviews this week.
> > > 
> > >  Cheers,Zhijiang-
> > >  -发件人：Ufuk
> > > 
> > >  Celebi 发送时间：2017年4月27日(星期四)

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Ted Yu

Flavio:
Have you seen this (w.r.t. 1.2.1) ?

http://search-hadoop.com/m/Flink/VkLeQejxLg24Lk0D1?subj=+RESULT+VOTE+Release+Apache+Flink+1+2+1+RC2+

On Fri, Apr 28, 2017 at 5:07 AM, Flavio Pompermaier 
wrote:

> Any chance to cherry-pick this also into 1.2.1? We're usign Flink 1.2.0 in
> production and maybe an upgrade to 1.2.1 would be a safer option in the
> short term..
>
> Best,
> Flavio
>
> On Fri, Apr 28, 2017 at 2:00 PM, Aljoscha Krettek 
> wrote:
>
> > Ah, I see. The fix for that has been merged into master so it will be
> > release in Flink 1.3.
> >
> > > On 28. Apr 2017, at 13:50, Flavio Pompermaier 
> > wrote:
> > >
> > > Sorry, you're right Aljosha..the issue number is correct, the link is
> > > wrong! The correct one is https://issues.apache.org/
> > jira/browse/FLINK-6398
> > >
> > > On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek <
> aljos...@apache.org>
> > > wrote:
> > >
> > >> I think there might be a typo. We haven’t yet reached issue number
> 6389,
> > >> if I’m not mistaken. The latest as I’m writing this is 6410.
> > >>
> > >>> On 28. Apr 2017, at 10:00, Flavio Pompermaier 
> > >> wrote:
> > >>>
> > >>> If it's not a problem it will be great for us to include also
> > FLINK-6398
> > >>>  if it's not a big
> > >> deal
> > >>>
> > >>> Best,
> > >>> Flavio
> > >>>
> > >>> On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <
> > >> zhuoluo@alibaba-inc.com>
> > >>> wrote:
> > >>>
> >  Hi Devs,
> > 
> >  Thanks for the release plan.
> > 
> >  Could you also please add the feature FLINK-6196
> >   Support dynamic
> > >> schema
> >  in Table Function?
> >  I’d like to update the code as comments left on PR today.
> >  I will try to make sure the code is updated before the Apr 30th.
> > 
> > 
> >  Thanks,
> > 
> >  Zhuoluo 
> > 
> > 
> > 
> > 
> > 
> >  在 2017年4月28日，上午8:48，Haohui Mai  写道：
> > 
> >  Hello,
> > 
> >  Thanks for starting this thread. It would be great to see the
> > following
> >  features available in Flink 1.3:
> > 
> >  * Support for complex schema: FLINK-6033, FLINK-6377
> >  * Various improvements on SQL over group windows: FLINK-6335,
> > FLINK-6373
> >  * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
> >  * Decoupling Flink and Hadoop: FLINK-5998
> > 
> >  All of them have gone through at least one round of review so I'm
> >  optimistic that they can make it to 1.3 in a day or two.
> > 
> >  Additionally it would be great to see FLINK-6232 go in, but it
> depends
> > >> on
> >  FLINK-5884 so it might be a little bit tough.
> > 
> >  Regards,
> >  Haohui
> > 
> >  On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler <
> ches...@apache.org
> > >
> >  wrote:
> > 
> >  Hello,
> > 
> >  FLINK-5892 (Restoring state by operator) is also nearing completion,
> > but
> >  with only 1 day left before the weekend we're cutting it really
> short.
> > 
> >  Since this eliminates a major pain point when updating jobs, as it
> >  allows the modification of chains, another day or 2 would be good i
> > >> think.
> > 
> >  Regards,
> >  Chesnay
> > 
> >  On 27.04.2017 18:55, Bowen Li wrote:
> > 
> >  Hi Ufuk,
> > I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)
> > 
> >  into
> > 
> >  release 1.3. It's in the final state of code review in
> >  https://github.com/apache/flink/pull/3736
> > 
> >  Thanks,
> >  Bowen
> > 
> >  On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
> >  wangzhijiang...@aliyun.com> wrote:
> > 
> >  Hi Ufuk,
> >  Thank you for launching this topic!
> >  I wish my latest refinement of buffer provider (
> > 
> >  https://issues.apache.org/
> > 
> >  jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs
> > can
> >  get benefit from it. And I think it can be completed with the help
> of
> > 
> >  your
> > 
> >  reviews this week.
> > 
> >  Cheers,Zhijiang-
> >  -发件人：Ufuk
> > 
> >  Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
> >  dev@flink.apache.org>抄 送：Robert Metzger 主
> >  题：[DISCUSS] Feature Freeze
> >  Hey devs! :-)
> > 
> >  We decided to follow a time-based release model with the upcoming
> 1.3
> >  release and the planned feature freeze is on Monday, May 1st.
> > 
> >  I wanted to start a discussion to get a quick overview of the
> current
> >  state of things.
> > 
> >  - Is everyone on track and aware of the feature

[jira] [Created] (FLINK-6415) Make sure Flink artifacts have no specific logger dependency

2017-04-28 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-6415:
---

 Summary: Make sure Flink artifacts have no specific logger 
dependency
 Key: FLINK-6415
 URL: https://issues.apache.org/jira/browse/FLINK-6415
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.2.1, 1.2.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 1.3.0


Flink's code is written against {{slf4j}}

To make sure users can use their custom logging framework we need to have

  - no direct compile-scope dependency in any core
  - a dependency in {{flink-dist}} that is not in the fat jar
  - an explicit dependency in {{examples}} (to see logs when running in the IDE)
  - an explicit test dependency (for logs of test execution)

All except point (1) are already fixed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6414) Use scala.binary.version in place of change-scala-version.sh

2017-04-28 Thread Greg Hogan (JIRA)

Greg Hogan created FLINK-6414:
-

 Summary: Use scala.binary.version in place of 
change-scala-version.sh
 Key: FLINK-6414
 URL: https://issues.apache.org/jira/browse/FLINK-6414
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 1.3.0
Reporter: Greg Hogan
Assignee: Greg Hogan


Recent commits have failed to modify {{change-scala-version.sh}} resulting in 
broken builds for {{scala-2.11}}. It looks like we can remove the need for this 
script by replacing hard-coded references to the Scala version with Flink's 
maven variable {{scala.binary.version}}.

I had initially realized that the change script is [only used for 
building|https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#scala-versions]
 and not for switching the IDE environment.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6413) Add stream operator callback to notify about consumed network buffer

2017-04-28 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-6413:
---

 Summary: Add stream operator callback to notify about consumed 
network buffer 
 Key: FLINK-6413
 URL: https://issues.apache.org/jira/browse/FLINK-6413
 Project: Flink
  Issue Type: Improvement
  Components: DataStream API
Reporter: Aljoscha Krettek


This is originally motivated by BEAM-1612. Beam has the notion of bundles and 
allows users to do work at the start/end of each bundle. This could be used for 
setting up some expensive connection or for batching accesses to some external 
system. There is also internal optimisation potential because accesses/updates 
to state could be kept in-memory per bundle/buffer and only afterwards be 
written to fault-tolerant state.

The bundling induced by the Flink network stack (which depends on the network 
buffer size and the buffer timeout) seems like a natural fit for this. I 
propose to add an _experimental_ interface {{BufferConsumedListener}} (or some 
such name):

{code}
interface BufferConsumedListener {
  void notifyBufferConsumed():
}
{code}

that is invoked in the input processor whenever a network buffer is exhausted: 
https://github.com/apache/flink/blob/922352ac35f3753334e834632e3e361fbd36336e/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java#L178-L178

The change is very simple, three lines of code would be added:

{code}
if (result.isBufferConsumed()) {
  currentRecordDeserializer.getCurrentBuffer().recycle();
  currentRecordDeserializer = null;
  if (streamOperator instanceof BufferConsumedListener) {
((BufferConsumedListener) streamOperator).notifyBufferConsumed():
  }
}
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Flavio Pompermaier

Any chance to cherry-pick this also into 1.2.1? We're usign Flink 1.2.0 in
production and maybe an upgrade to 1.2.1 would be a safer option in the
short term..

Best,
Flavio

On Fri, Apr 28, 2017 at 2:00 PM, Aljoscha Krettek 
wrote:

> Ah, I see. The fix for that has been merged into master so it will be
> release in Flink 1.3.
>
> > On 28. Apr 2017, at 13:50, Flavio Pompermaier 
> wrote:
> >
> > Sorry, you're right Aljosha..the issue number is correct, the link is
> > wrong! The correct one is https://issues.apache.org/
> jira/browse/FLINK-6398
> >
> > On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek 
> > wrote:
> >
> >> I think there might be a typo. We haven’t yet reached issue number 6389,
> >> if I’m not mistaken. The latest as I’m writing this is 6410.
> >>
> >>> On 28. Apr 2017, at 10:00, Flavio Pompermaier 
> >> wrote:
> >>>
> >>> If it's not a problem it will be great for us to include also
> FLINK-6398
> >>>  if it's not a big
> >> deal
> >>>
> >>> Best,
> >>> Flavio
> >>>
> >>> On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <
> >> zhuoluo@alibaba-inc.com>
> >>> wrote:
> >>>
>  Hi Devs,
> 
>  Thanks for the release plan.
> 
>  Could you also please add the feature FLINK-6196
>   Support dynamic
> >> schema
>  in Table Function?
>  I’d like to update the code as comments left on PR today.
>  I will try to make sure the code is updated before the Apr 30th.
> 
> 
>  Thanks,
> 
>  Zhuoluo 
> 
> 
> 
> 
> 
>  在 2017年4月28日，上午8:48，Haohui Mai  写道：
> 
>  Hello,
> 
>  Thanks for starting this thread. It would be great to see the
> following
>  features available in Flink 1.3:
> 
>  * Support for complex schema: FLINK-6033, FLINK-6377
>  * Various improvements on SQL over group windows: FLINK-6335,
> FLINK-6373
>  * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
>  * Decoupling Flink and Hadoop: FLINK-5998
> 
>  All of them have gone through at least one round of review so I'm
>  optimistic that they can make it to 1.3 in a day or two.
> 
>  Additionally it would be great to see FLINK-6232 go in, but it depends
> >> on
>  FLINK-5884 so it might be a little bit tough.
> 
>  Regards,
>  Haohui
> 
>  On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler  >
>  wrote:
> 
>  Hello,
> 
>  FLINK-5892 (Restoring state by operator) is also nearing completion,
> but
>  with only 1 day left before the weekend we're cutting it really short.
> 
>  Since this eliminates a major pain point when updating jobs, as it
>  allows the modification of chains, another day or 2 would be good i
> >> think.
> 
>  Regards,
>  Chesnay
> 
>  On 27.04.2017 18:55, Bowen Li wrote:
> 
>  Hi Ufuk,
> I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)
> 
>  into
> 
>  release 1.3. It's in the final state of code review in
>  https://github.com/apache/flink/pull/3736
> 
>  Thanks,
>  Bowen
> 
>  On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
>  wangzhijiang...@aliyun.com> wrote:
> 
>  Hi Ufuk,
>  Thank you for launching this topic!
>  I wish my latest refinement of buffer provider (
> 
>  https://issues.apache.org/
> 
>  jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs
> can
>  get benefit from it. And I think it can be completed with the help of
> 
>  your
> 
>  reviews this week.
> 
>  Cheers,Zhijiang-
>  -发件人：Ufuk
> 
>  Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
>  dev@flink.apache.org>抄 送：Robert Metzger 主
>  题：[DISCUSS] Feature Freeze
>  Hey devs! :-)
> 
>  We decided to follow a time-based release model with the upcoming 1.3
>  release and the planned feature freeze is on Monday, May 1st.
> 
>  I wanted to start a discussion to get a quick overview of the current
>  state of things.
> 
>  - Is everyone on track and aware of the feature freeze? ;)
>  - Are there any major features we want in 1.3 that
>  have not been merged yet?
>  - Do we need to extend the feature freeze, because of an
>  important feature?
> 
>  Would be great to gather a list of features/PRs that we want in the
>  1.3 release. This could be a good starting point for the release
>  manager (@Robert?).
> 
>  Best,
> 
>  Ufuk
> 
> 
> 
> 
> 
> >>
> >>
>

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Aljoscha Krettek

Ah, I see. The fix for that has been merged into master so it will be release 
in Flink 1.3.

> On 28. Apr 2017, at 13:50, Flavio Pompermaier  wrote:
> 
> Sorry, you're right Aljosha..the issue number is correct, the link is
> wrong! The correct one is https://issues.apache.org/jira/browse/FLINK-6398
> 
> On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek 
> wrote:
> 
>> I think there might be a typo. We haven’t yet reached issue number 6389,
>> if I’m not mistaken. The latest as I’m writing this is 6410.
>> 
>>> On 28. Apr 2017, at 10:00, Flavio Pompermaier 
>> wrote:
>>> 
>>> If it's not a problem it will be great for us to include also FLINK-6398
>>>  if it's not a big
>> deal
>>> 
>>> Best,
>>> Flavio
>>> 
>>> On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <
>> zhuoluo@alibaba-inc.com>
>>> wrote:
>>> 
 Hi Devs,

 Thanks for the release plan.

 Could you also please add the feature FLINK-6196
  Support dynamic
>> schema
 in Table Function?
 I’d like to update the code as comments left on PR today.
 I will try to make sure the code is updated before the Apr 30th.

 Thanks,

 Zhuoluo 

 在 2017年4月28日，上午8:48，Haohui Mai  写道：

 Hello,

 Thanks for starting this thread. It would be great to see the following
 features available in Flink 1.3:

 * Support for complex schema: FLINK-6033, FLINK-6377
 * Various improvements on SQL over group windows: FLINK-6335, FLINK-6373
 * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
 * Decoupling Flink and Hadoop: FLINK-5998

 All of them have gone through at least one round of review so I'm
 optimistic that they can make it to 1.3 in a day or two.

 Additionally it would be great to see FLINK-6232 go in, but it depends
>> on
 FLINK-5884 so it might be a little bit tough.

 Regards,
 Haohui

 On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler 
 wrote:

 Hello,

 FLINK-5892 (Restoring state by operator) is also nearing completion, but
 with only 1 day left before the weekend we're cutting it really short.

 Since this eliminates a major pain point when updating jobs, as it
 allows the modification of chains, another day or 2 would be good i
>> think.

 Regards,
 Chesnay

 On 27.04.2017 18:55, Bowen Li wrote:

 Hi Ufuk,
I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)

 into

 release 1.3. It's in the final state of code review in
 https://github.com/apache/flink/pull/3736

 Thanks,
 Bowen

 On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
 wangzhijiang...@aliyun.com> wrote:

 Hi Ufuk,
 Thank you for launching this topic!
 I wish my latest refinement of buffer provider (

 https://issues.apache.org/

 jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs can
 get benefit from it. And I think it can be completed with the help of

 your

 reviews this week.

 Cheers,Zhijiang-
 -发件人：Ufuk

 Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
 dev@flink.apache.org>抄 送：Robert Metzger 主
 题：[DISCUSS] Feature Freeze
 Hey devs! :-)

 We decided to follow a time-based release model with the upcoming 1.3
 release and the planned feature freeze is on Monday, May 1st.

 I wanted to start a discussion to get a quick overview of the current
 state of things.

 - Is everyone on track and aware of the feature freeze? ;)
 - Are there any major features we want in 1.3 that
 have not been merged yet?
 - Do we need to extend the feature freeze, because of an
 important feature?

 Would be great to gather a list of features/PRs that we want in the
 1.3 release. This could be a good starting point for the release
 manager (@Robert?).

 Best,

 Ufuk

>> 
>>

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Flavio Pompermaier

Sorry, you're right Aljosha..the issue number is correct, the link is
wrong! The correct one is https://issues.apache.org/jira/browse/FLINK-6398

On Fri, Apr 28, 2017 at 11:48 AM, Aljoscha Krettek 
wrote:

> I think there might be a typo. We haven’t yet reached issue number 6389,
> if I’m not mistaken. The latest as I’m writing this is 6410.
>
> > On 28. Apr 2017, at 10:00, Flavio Pompermaier 
> wrote:
> >
> > If it's not a problem it will be great for us to include also FLINK-6398
> >  if it's not a big
> deal
> >
> > Best,
> > Flavio
> >
> > On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang <
> zhuoluo@alibaba-inc.com>
> > wrote:
> >
> >> Hi Devs,
> >>
> >> Thanks for the release plan.
> >>
> >> Could you also please add the feature FLINK-6196
> >>  Support dynamic
> schema
> >> in Table Function?
> >> I’d like to update the code as comments left on PR today.
> >> I will try to make sure the code is updated before the Apr 30th.
> >>
> >>
> >> Thanks,
> >>
> >> Zhuoluo 
> >>
> >>
> >>
> >>
> >>
> >> 在 2017年4月28日，上午8:48，Haohui Mai  写道：
> >>
> >> Hello,
> >>
> >> Thanks for starting this thread. It would be great to see the following
> >> features available in Flink 1.3:
> >>
> >> * Support for complex schema: FLINK-6033, FLINK-6377
> >> * Various improvements on SQL over group windows: FLINK-6335, FLINK-6373
> >> * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
> >> * Decoupling Flink and Hadoop: FLINK-5998
> >>
> >> All of them have gone through at least one round of review so I'm
> >> optimistic that they can make it to 1.3 in a day or two.
> >>
> >> Additionally it would be great to see FLINK-6232 go in, but it depends
> on
> >> FLINK-5884 so it might be a little bit tough.
> >>
> >> Regards,
> >> Haohui
> >>
> >> On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler 
> >> wrote:
> >>
> >> Hello,
> >>
> >> FLINK-5892 (Restoring state by operator) is also nearing completion, but
> >> with only 1 day left before the weekend we're cutting it really short.
> >>
> >> Since this eliminates a major pain point when updating jobs, as it
> >> allows the modification of chains, another day or 2 would be good i
> think.
> >>
> >> Regards,
> >> Chesnay
> >>
> >> On 27.04.2017 18:55, Bowen Li wrote:
> >>
> >> Hi Ufuk,
> >> I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)
> >>
> >> into
> >>
> >> release 1.3. It's in the final state of code review in
> >> https://github.com/apache/flink/pull/3736
> >>
> >> Thanks,
> >> Bowen
> >>
> >> On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
> >> wangzhijiang...@aliyun.com> wrote:
> >>
> >> Hi Ufuk,
> >> Thank you for launching this topic!
> >> I wish my latest refinement of buffer provider (
> >>
> >> https://issues.apache.org/
> >>
> >> jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs can
> >> get benefit from it. And I think it can be completed with the help of
> >>
> >> your
> >>
> >> reviews this week.
> >>
> >> Cheers,Zhijiang-
> >> -发件人：Ufuk
> >>
> >> Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
> >> dev@flink.apache.org>抄 送：Robert Metzger 主
> >> 题：[DISCUSS] Feature Freeze
> >> Hey devs! :-)
> >>
> >> We decided to follow a time-based release model with the upcoming 1.3
> >> release and the planned feature freeze is on Monday, May 1st.
> >>
> >> I wanted to start a discussion to get a quick overview of the current
> >> state of things.
> >>
> >> - Is everyone on track and aware of the feature freeze? ;)
> >> - Are there any major features we want in 1.3 that
> >> have not been merged yet?
> >> - Do we need to extend the feature freeze, because of an
> >> important feature?
> >>
> >> Would be great to gather a list of features/PRs that we want in the
> >> 1.3 release. This could be a good starting point for the release
> >> manager (@Robert?).
> >>
> >> Best,
> >>
> >> Ufuk
> >>
> >>
> >>
> >>
> >>
>
>

[jira] [Created] (FLINK-6412) Stream has already been closed durign job cancel

2017-04-28 Thread Andrey (JIRA)

Andrey created FLINK-6412:
-

 Summary: Stream has already been closed durign job cancel
 Key: FLINK-6412
 URL: https://issues.apache.org/jira/browse/FLINK-6412
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Andrey


Steps to reproduce:
* create job with RocksDBStateBackend: env.setStateBackend(new 
RocksDBStateBackend(checkpointRoot));
* run job
* cancel job from the Web UI.

In logs:
{code}
2017-04-28 10:25:57,475 INFO  org.apache.flink.runtime.taskmanager.Task 
- Source: Custom Source (1/2) (05a3888ed2a232f234a10775826395a0) 
switched from DEPLOYING to RUNNING. [Source: Custom Source (1/2)]
2017-04-28 10:25:57,476 INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask   - Using 
user-defined state backend: RocksDB State Backend {isInitialized=false, 
configuredDbBasePaths=null, initializedDbBasePaths=null, 
checkpointStreamBackend=File State Backend @ 
file:/flink/checkpoints/flinktests}. [Map (2/2)]
2017-04-28 10:25:57,476 INFO  
org.apache.flink.streaming.runtime.tasks.StreamTask   - Using 
user-defined state backend: RocksDB State Backend {isInitialized=false, 
configuredDbBasePaths=null, initializedDbBasePaths=null, 
checkpointStreamBackend=File State Backend @ 
file:/flink/checkpoints/flinktests}. [Source: Custom Source (1/2)]
...
2017-04-28 10:26:29,793 INFO  org.apache.flink.runtime.taskmanager.Task 
- Triggering cancellation of task code Source: Custom Source (1/2) 
(05a3888ed2a232f234a10775826395a0). [flink-akka.actor.default-dispatcher-2]
2017-04-28 10:26:29,794 INFO  org.apache.flink.runtime.taskmanager.Task 
- Attempting to cancel task Map (2/2) 
(bdb982b6ef47fe79b6ff5b96153c921e). [flink-akka.actor.default-dispatcher-2]
2017-04-28 10:26:29,794 INFO  org.apache.flink.runtime.taskmanager.Task 
- Map (2/2) (bdb982b6ef47fe79b6ff5b96153c921e) switched from 
RUNNING to CANCELING. [flink-akka.actor.default-dispatcher-2]
2017-04-28 10:26:29,796 INFO  org.apache.flink.runtime.taskmanager.Task 
- Triggering cancellation of task code Map (2/2) 
(bdb982b6ef47fe79b6ff5b96153c921e). [flink-akka.actor.default-dispatcher-2]
2017-04-28 10:26:29,796 INFO  org.apache.flink.runtime.taskmanager.Task 
- Source: Custom Source (1/2) (05a3888ed2a232f234a10775826395a0) 
switched from CANCELING to CANCELED. [Source: Custom Source (1/2)]
2017-04-28 10:26:29,797 INFO  org.apache.flink.runtime.taskmanager.Task 
- Freeing task resources for Source: Custom Source (1/2) 
(05a3888ed2a232f234a10775826395a0). [Source: Custom Source (1/2)]
2017-04-28 10:26:29,798 INFO  org.apache.flink.core.fs.FileSystem   
- Ensuring all FileSystem streams are closed for Source: Custom 
Source (1/2) [Source: Custom Source (1/2)]
2017-04-28 10:26:29,803 INFO  org.apache.flink.runtime.taskmanager.TaskManager  
- Un-registering task and sending final execution state CANCELED to 
JobManager for task Source: Custom Source (05a3888ed2a232f234a10775826395a0) 
[flink-akka.actor.default-dispatcher-2]
2017-04-28 10:26:39,608 INFO  org.apache.flink.runtime.taskmanager.Task 
- Attempting to fail task externally Map (2/2) 
(bdb982b6ef47fe79b6ff5b96153c921e). [pool-14-thread-1]
2017-04-28 10:26:39,608 WARN  
org.apache.flink.streaming.runtime.tasks.StreamTask   - Could not 
properly clean up the async checkpoint runnable. [Canceler for Map (2/2) 
(bdb982b6ef47fe79b6ff5b96153c921e).]
java.lang.Exception: Could not properly cancel managed keyed state future.
at 
org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1010)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.close(StreamTask.java:995)
at org.apache.flink.util.IOUtils.closeQuietly(IOUtils.java:262)
at org.apache.flink.util.IOUtils.closeAllQuietly(IOUtils.java:251)
at 
org.apache.flink.util.AbstractCloseableRegistry.close(AbstractCloseableRegistry.java:91)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.cancel(StreamTask.java:364)
at 
org.apache.flink.runtime.taskmanager.Task$TaskCanceler.run(Task.java:1390)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Stream 
has already been closed and discarded.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40)
at 
org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:79)
at

[jira] [Created] (FLINK-6411) YarnApplicationMasterRunner should not interfere with RunningJobsRegistry

2017-04-28 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-6411:


 Summary: YarnApplicationMasterRunner should not interfere with 
RunningJobsRegistry
 Key: FLINK-6411
 URL: https://issues.apache.org/jira/browse/FLINK-6411
 Project: Flink
  Issue Type: Bug
  Components: Distributed Coordination
Affects Versions: 1.3.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.3.0


The {{YarnApplicationMasterRunner}} removes the running job from the 
{{RunningJobsRegistry}} when it is shut down. This should not be its 
responsibility and rather be delegated to the {{JobManagerRunner}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: question about rowtime processfunction - are watermarks needed?

2017-04-28 Thread Fabian Hueske

I woundn't say that the batch / stream unification doesn't hold.
If you make the watermark lag enough (i.e., you do not have any late data)
you get exact results, however, not very timely.

Watermarks basically trade completeness of the result for result latency.
With support for late updates, you can get early (possibly incomplete)
results and correct them when you get additional (late) data.

However, you need to discard state at some point in time.
If you receive late data for which the required state was discarded, this
is the point where batch and streaming results start to diverge.

Cheers, Fabian

2017-04-28 11:39 GMT+02:00 Radu Tudoran :

> Hi,
> Thanks again Fabian for the explanation.
> Considering what you said - is there anymore a duality with the batch
> case? As the stream cases are non-deterministic I would say the duality in
> the sense that a query on the stream should return the same as the query on
> the batched data does not hold anymore?
> I am just trying to get a deeper understanding of this, which I think will
> apply also to the other functions and SQL operators...sorry for bothering
> you with this.
>
> -Original Message-
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Friday, April 28, 2017 9:56 AM
> To: dev@flink.apache.org
> Subject: Re: question about rowtime processfunction - are watermarks
> needed?
>
> Hi Radu,
>
> yes that might happen in a parallel setup and depends on the "speed" of
> the parallel threads.
> An operator does only increment its own event-time clock to the minimum of
> the last watermark received from each input channel.
> If one input channel is "slow", the event-time of an operator lacks behind
> and "late" events of the other threads are correctly processed because the
> operators event-time was not incremented yet.
>
> So, event-time is not deterministic when it comes to which records are
> dropped.
> The watermark documentation might be helpful as well [1].
>
> Cheers,
> Fabian
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.2/dev/event_time.html#watermarks-in-parallel-streams
>
> 2017-04-27 22:09 GMT+02:00 Radu Tudoran :
>
> > Re-hi,
> >
> > I debuged a bit the test for the Event rowtime
> >
> > I tested the testBoundNonPartitionedEventTimeWindowWithRange from
> > SQLITCase class
> >
> > Although I would expect that once a watermark is triggered: 1) the on
> > timer will be called to process the events that arrived so far and 2)
> > the future events that arrive will be dropped. However, it seems that
> > almost the entire input can arrive in the processElement function
> > before the onTimer is triggered.
> >
> > Moreover, if you modify the input to add an un-ordered event (see
> > dataset below where I added after watermark 14000 ...an event with
> > watermark 1000...as far as I would expect this should be dropped.
> > However, in different runs it can happen that it will be not dropped.
> > Basically it can happen that the onTimer was never triggered and this
> > event arrives and it is registered). Is this correct? Am I missing
> something?
> >
> >
> >@Test
> >   def testBoundNonPartitionedEventTimeWindowWithRangeUnOrder(): Unit = {
> > val data = Seq(
> >   Left((1500L, (1L, 15, "Hello"))),
> >   Left((1600L, (1L, 16, "Hello"))),
> >   Left((1000L, (1L, 1, "Hello"))),
> >   Left((2000L, (2L, 2, "Hello"))),
> >   Right(1000L),
> >   Left((2000L, (2L, 2, "Hello"))),
> >   Left((2000L, (2L, 3, "Hello"))),
> >   Left((3000L, (3L, 3, "Hello"))),
> >   Right(2000L),
> >   Left((4000L, (4L, 4, "Hello"))),
> >   Right(3000L),
> >   Left((5000L, (5L, 5, "Hello"))),
> >   Right(5000L),
> >   Left((6000L, (6L, 6, "Hello"))),
> >   Left((6500L, (6L, 65, "Hello"))),
> >   Right(7000L),
> >   Left((9000L, (6L, 9, "Hello"))),
> >   Left((9500L, (6L, 18, "Hello"))),
> >   Left((9000L, (6L, 9, "Hello"))),
> >   Right(1L),
> >   Left((1L, (7L, 7, "Hello World"))),
> >   Left((11000L, (7L, 17, "Hello World"))),
> >   Left((11000L, (7L, 77, "Hello World"))),
> >   Right(12000L),
> >   Left((14000L, (7L, 18, "Hello World"))),
> >   Right(14000L),
> >   Left((15000L, (8L, 8, "Hello World"))),
> >   Left((1000L, (8L, 8, "Too late - Hello World"))),   ///event is out
> > of ordered and showed be droppped
> >   Right(17000L),
> >   Left((2L, (20L, 20, "Hello World"))),
> >   Right(19000L))
> >
> >
> >
> >
> > -Original Message-
> > From: Fabian Hueske [mailto:fhue...@gmail.com]
> > Sent: Thursday, April 27, 2017 3:17 PM
> > To: dev@flink.apache.org
> > Subject: Re: question about rowtime processfunction - are watermarks
> > needed?
> >
> > Hi Radu,
> >
> > event-time processing requires watermarks. Operators use watermarks to
> > compute the current event-time.
> > The ProcessFunctions for over range windows use the TimerServices

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Aljoscha Krettek

I think there might be a typo. We haven’t yet reached issue number 6389, if I’m 
not mistaken. The latest as I’m writing this is 6410.

> On 28. Apr 2017, at 10:00, Flavio Pompermaier  wrote:
> 
> If it's not a problem it will be great for us to include also FLINK-6398
>  if it's not a big deal
> 
> Best,
> Flavio
> 
> On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang 
> wrote:
> 
>> Hi Devs,
>> 
>> Thanks for the release plan.
>> 
>> Could you also please add the feature FLINK-6196
>>  Support dynamic schema
>> in Table Function?
>> I’d like to update the code as comments left on PR today.
>> I will try to make sure the code is updated before the Apr 30th.
>> 
>> 
>> Thanks,
>> 
>> Zhuoluo 
>> 
>> 
>> 
>> 
>> 
>> 在 2017年4月28日，上午8:48，Haohui Mai  写道：
>> 
>> Hello,
>> 
>> Thanks for starting this thread. It would be great to see the following
>> features available in Flink 1.3:
>> 
>> * Support for complex schema: FLINK-6033, FLINK-6377
>> * Various improvements on SQL over group windows: FLINK-6335, FLINK-6373
>> * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
>> * Decoupling Flink and Hadoop: FLINK-5998
>> 
>> All of them have gone through at least one round of review so I'm
>> optimistic that they can make it to 1.3 in a day or two.
>> 
>> Additionally it would be great to see FLINK-6232 go in, but it depends on
>> FLINK-5884 so it might be a little bit tough.
>> 
>> Regards,
>> Haohui
>> 
>> On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler 
>> wrote:
>> 
>> Hello,
>> 
>> FLINK-5892 (Restoring state by operator) is also nearing completion, but
>> with only 1 day left before the weekend we're cutting it really short.
>> 
>> Since this eliminates a major pain point when updating jobs, as it
>> allows the modification of chains, another day or 2 would be good i think.
>> 
>> Regards,
>> Chesnay
>> 
>> On 27.04.2017 18:55, Bowen Li wrote:
>> 
>> Hi Ufuk,
>> I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)
>> 
>> into
>> 
>> release 1.3. It's in the final state of code review in
>> https://github.com/apache/flink/pull/3736
>> 
>> Thanks,
>> Bowen
>> 
>> On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
>> wangzhijiang...@aliyun.com> wrote:
>> 
>> Hi Ufuk,
>> Thank you for launching this topic!
>> I wish my latest refinement of buffer provider (
>> 
>> https://issues.apache.org/
>> 
>> jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs can
>> get benefit from it. And I think it can be completed with the help of
>> 
>> your
>> 
>> reviews this week.
>> 
>> Cheers,Zhijiang-
>> -发件人：Ufuk
>> 
>> Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
>> dev@flink.apache.org>抄 送：Robert Metzger 主
>> 题：[DISCUSS] Feature Freeze
>> Hey devs! :-)
>> 
>> We decided to follow a time-based release model with the upcoming 1.3
>> release and the planned feature freeze is on Monday, May 1st.
>> 
>> I wanted to start a discussion to get a quick overview of the current
>> state of things.
>> 
>> - Is everyone on track and aware of the feature freeze? ;)
>> - Are there any major features we want in 1.3 that
>> have not been merged yet?
>> - Do we need to extend the feature freeze, because of an
>> important feature?
>> 
>> Would be great to gather a list of features/PRs that we want in the
>> 1.3 release. This could be a good starting point for the release
>> manager (@Robert?).
>> 
>> Best,
>> 
>> Ufuk
>> 
>> 
>> 
>> 
>>

[jira] [Created] (FLINK-6410) build fails after changing Scala to 2.11

2017-04-28 Thread Stephan Ewen (JIRA)

Stephan Ewen created FLINK-6410:
---

 Summary: build fails after changing Scala to 2.11
 Key: FLINK-6410
 URL: https://issues.apache.org/jira/browse/FLINK-6410
 Project: Flink
  Issue Type: Bug
  Components: Build System
Affects Versions: 1.3.0
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 1.3.0


The {{change-scala-version.sh}} script does not correctly adjust the 
{{bin.xml}} assembly descriptor.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

RE: question about rowtime processfunction - are watermarks needed?

2017-04-28 Thread Radu Tudoran

Hi,
Thanks again Fabian for the explanation. 
Considering what you said - is there anymore a duality with the batch case? As 
the stream cases are non-deterministic I would say the duality in the sense 
that a query on the stream should return the same as the query on the batched 
data does not hold anymore?
I am just trying to get a deeper understanding of this, which I think will 
apply also to the other functions and SQL operators...sorry for bothering you 
with this.

-Original Message-
From: Fabian Hueske [mailto:fhue...@gmail.com] 
Sent: Friday, April 28, 2017 9:56 AM
To: dev@flink.apache.org
Subject: Re: question about rowtime processfunction - are watermarks needed?

Hi Radu,

yes that might happen in a parallel setup and depends on the "speed" of the 
parallel threads.
An operator does only increment its own event-time clock to the minimum of the 
last watermark received from each input channel.
If one input channel is "slow", the event-time of an operator lacks behind and 
"late" events of the other threads are correctly processed because the 
operators event-time was not incremented yet.

So, event-time is not deterministic when it comes to which records are dropped.
The watermark documentation might be helpful as well [1].

Cheers,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html#watermarks-in-parallel-streams

2017-04-27 22:09 GMT+02:00 Radu Tudoran :

> Re-hi,
>
> I debuged a bit the test for the Event rowtime
>
> I tested the testBoundNonPartitionedEventTimeWindowWithRange from 
> SQLITCase class
>
> Although I would expect that once a watermark is triggered: 1) the on 
> timer will be called to process the events that arrived so far and 2) 
> the future events that arrive will be dropped. However, it seems that 
> almost the entire input can arrive in the processElement function 
> before the onTimer is triggered.
>
> Moreover, if you modify the input to add an un-ordered event (see 
> dataset below where I added after watermark 14000 ...an event with 
> watermark 1000...as far as I would expect this should be dropped. 
> However, in different runs it can happen that it will be not dropped. 
> Basically it can happen that the onTimer was never triggered and this 
> event arrives and it is registered). Is this correct? Am I missing something?
>
>
>@Test
>   def testBoundNonPartitionedEventTimeWindowWithRangeUnOrder(): Unit = {
> val data = Seq(
>   Left((1500L, (1L, 15, "Hello"))),
>   Left((1600L, (1L, 16, "Hello"))),
>   Left((1000L, (1L, 1, "Hello"))),
>   Left((2000L, (2L, 2, "Hello"))),
>   Right(1000L),
>   Left((2000L, (2L, 2, "Hello"))),
>   Left((2000L, (2L, 3, "Hello"))),
>   Left((3000L, (3L, 3, "Hello"))),
>   Right(2000L),
>   Left((4000L, (4L, 4, "Hello"))),
>   Right(3000L),
>   Left((5000L, (5L, 5, "Hello"))),
>   Right(5000L),
>   Left((6000L, (6L, 6, "Hello"))),
>   Left((6500L, (6L, 65, "Hello"))),
>   Right(7000L),
>   Left((9000L, (6L, 9, "Hello"))),
>   Left((9500L, (6L, 18, "Hello"))),
>   Left((9000L, (6L, 9, "Hello"))),
>   Right(1L),
>   Left((1L, (7L, 7, "Hello World"))),
>   Left((11000L, (7L, 17, "Hello World"))),
>   Left((11000L, (7L, 77, "Hello World"))),
>   Right(12000L),
>   Left((14000L, (7L, 18, "Hello World"))),
>   Right(14000L),
>   Left((15000L, (8L, 8, "Hello World"))),
>   Left((1000L, (8L, 8, "Too late - Hello World"))),   ///event is out
> of ordered and showed be droppped
>   Right(17000L),
>   Left((2L, (20L, 20, "Hello World"))),
>   Right(19000L))
>
>
>
>
> -Original Message-
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Thursday, April 27, 2017 3:17 PM
> To: dev@flink.apache.org
> Subject: Re: question about rowtime processfunction - are watermarks 
> needed?
>
> Hi Radu,
>
> event-time processing requires watermarks. Operators use watermarks to 
> compute the current event-time.
> The ProcessFunctions for over range windows use the TimerServices to 
> group elements by time.
> In case of event-time, the timers are triggered by the event-time of 
> the operator which is derived from the received watermarks.
> In case of processing-time, the timers are triggered based on the 
> wallclock time of the operator.
>
> So by using event-tim timers, we implicitly rely on the watermarks 
> because the timers are triggered based on the received watermarks.
>
> Best, Fabian
>
>
> 2017-04-27 10:51 GMT+02:00 Radu Tudoran :
>
> > Hi,
> >
> > I am looking at the implementation of  RowTimeBoundedRangeOver (in 
> > the context of Stream SQL). I see that the logic is that the 
> > progress happens based on the timestamps of the rowevent - i.e., 
> > when an even arrives we register to be processed based on it's 
> > timestamp
> (ctx.timerService.
> > registerEventTimeTimer(triggeringTs))
> >
> > In

[jira] [Created] (FLINK-6409) TUMBLE/HOP/SESSION_START/END do not resolve time field correctly

2017-04-28 Thread Timo Walther (JIRA)

Timo Walther created FLINK-6409:
---

 Summary: TUMBLE/HOP/SESSION_START/END do not resolve time field 
correctly
 Key: FLINK-6409
 URL: https://issues.apache.org/jira/browse/FLINK-6409
 Project: Flink
  Issue Type: Bug
  Components: Table API & SQL
Reporter: Timo Walther
Assignee: Timo Walther


Calcite has a bug and cannot resolve the time fields of auxiliary group 
functions correctly. A discussion can be found in CALCITE-1761.

Right now this issue only affects our batch SQL API, but it is a blocker for 
FLINK-5884.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: Official Flink Docker images

2017-04-28 Thread Ismaël Mejía

Hello,

I am absolutely happy to see that this is finally happening!
We have a really neat image right now and it is great that it will be
soon so easy to use.

One extra thing to mention is that Flink will have now two docker
images, one based on debian and the other one based on Alpine as most
official java-based projects do.

In the future we expect to improve the documentation on how to use the
image with kubernetes and continue improving the actual documentation
with docker. If anyone wants to join to also document something or add
any improvement/feature you need, you are all welcome.

Finally, I would also like to thank Maximilian Michels which
contributed and reviewed some of my early changes on the image.

Regards,
Ismaël

ps. We will 'announce' when the official images are available on
docker hub, so everyone can start to use them.



On Thu, Apr 27, 2017 at 1:38 PM, Patrick Lucas
 wrote:
> I've been informed that images don't make it through the list!
>
> You can see the aforementioned squirrel here
> 
> .
>
> --
> Patrick Lucas
>
> On Thu, Apr 27, 2017 at 12:21 PM, Patrick Lucas 
> wrote:
>
>> As part of an ongoing effort to improve the experience of using Flink on
>> Docker, some work has been done over the last two months to publish
>> official Flink Docker images to Docker Hub. The goal in the short term is
>> to make running a simple Flink cluster (almost) as easy as running docker
>> run flink. In the long term, we would like these images to be good enough
>> to use in production directly, or as base images for use in an existing
>> Docker workflow.
>>
>> Flink 1.2.1 has some fixes over the last few releases that make running it
>> on Docker nicer—and in some cases, possible. Notably, FLINK-2821
>>  allowed Dockerized
>> Flink to run across multiple hosts, and FLINK-4326
>>  added an option to run
>> Flink in the foreground, which is greatly preferred when running in Docker.
>>
>> We (Ismaël Mejía and myself, with some discussion with Stephan Ewen)
>> decided it made sense to bring the actual Dockerfiles outside of the Apache
>> Flink git repo, primarily to conform with every other Apache project that
>> has official images, but also because these scripts logically exist
>> decoupled from any particular Flink version. They are still Apache-licensed
>> and maintained by the community.
>>
>> Please reply here or on the relevant JIRA/GitHub issue if you have
>> questions or feedback.
>>
>> Here's a squirrel in a container:
>>
>>
>>
>> References:
>>
>>- FLINK-3026: Publish the flink docker container to the docker registry
>>
>>- Repo for the Dockerfiles and the scripts that generate them
>>
>>- GitHub PR to add the official images to Docker Hub
>>
>>- Improvements to the quality of running Flink on Docker to be made in
>>future Flink releases:
>>   - FLINK-6300: PID1 of docker images does not behave correctly
>>   
>>   - FLINK-6369: Better support for overlay networks
>>   
>>
>> Thanks to Ismaël Mejía, Jamie Grier, and Stephan Ewen for their
>> contributions.
>>
>> --
>> Patrick Lucas
>>

[jira] [Created] (FLINK-6408) Repeated loading of configuration files in hadoop filesystem code paths

2017-04-28 Thread Stephen Gran (JIRA)

Stephen Gran created FLINK-6408:
---

 Summary: Repeated loading of configuration files in hadoop 
filesystem code paths
 Key: FLINK-6408
 URL: https://issues.apache.org/jira/browse/FLINK-6408
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.2.1
Reporter: Stephen Gran
Priority: Minor


We are running flink on mesos in AWS.  Checkpointing is enabled with an s3 
backend, configured via the hadoop s3a filesystem implementation and done every 
second.

We are seeing roughly 3 million log events per hour from a relatively small 
job, and it appears that this is because every s3 copy event reloads the hadoop 
configuration, which in turn reloads the flink configuration.  The flink 
configuration loader is outputting each key/value pair every time it is 
invoked, leading to this volume of logs.

While the logging is relatively easy to deal with - just a log4j setting - the 
behaviour is probably suboptimal.  It seems that the configuration loader could 
easily be changed over to a singleton pattern to prevent the constant rereading 
of files.

If you're interested, we can probably knock up a patch for this in a relatively 
short time.

Cheers,



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [DISCUSS] Feature Freeze

2017-04-28 Thread Flavio Pompermaier

If it's not a problem it will be great for us to include also FLINK-6398
 if it's not a big deal

Best,
Flavio

On Fri, Apr 28, 2017 at 3:32 AM, Zhuoluo Yang 
wrote:

> Hi Devs,
>
> Thanks for the release plan.
>
> Could you also please add the feature FLINK-6196
>  Support dynamic schema
> in Table Function?
> I’d like to update the code as comments left on PR today.
> I will try to make sure the code is updated before the Apr 30th.
>
>
> Thanks,
>
> Zhuoluo 
>
>
>
>
>
> 在 2017年4月28日，上午8:48，Haohui Mai  写道：
>
> Hello,
>
> Thanks for starting this thread. It would be great to see the following
> features available in Flink 1.3:
>
> * Support for complex schema: FLINK-6033, FLINK-6377
> * Various improvements on SQL over group windows: FLINK-6335, FLINK-6373
> * StreamTableSink for JDBC and Cassandra: FLINK-6281, FLINK-6225
> * Decoupling Flink and Hadoop: FLINK-5998
>
> All of them have gone through at least one round of review so I'm
> optimistic that they can make it to 1.3 in a day or two.
>
> Additionally it would be great to see FLINK-6232 go in, but it depends on
> FLINK-5884 so it might be a little bit tough.
>
> Regards,
> Haohui
>
> On Thu, Apr 27, 2017 at 12:22 PM Chesnay Schepler 
> wrote:
>
> Hello,
>
> FLINK-5892 (Restoring state by operator) is also nearing completion, but
> with only 1 day left before the weekend we're cutting it really short.
>
> Since this eliminates a major pain point when updating jobs, as it
> allows the modification of chains, another day or 2 would be good i think.
>
> Regards,
> Chesnay
>
> On 27.04.2017 18:55, Bowen Li wrote:
>
> Hi Ufuk,
>  I'd like to get FLINK-6013 (Adding Datadog Http metrics reporter)
>
> into
>
> release 1.3. It's in the final state of code review in
> https://github.com/apache/flink/pull/3736
>
> Thanks,
> Bowen
>
> On Thu, Apr 27, 2017 at 8:38 AM, Zhijiang(wangzhijiang999) <
> wangzhijiang...@aliyun.com> wrote:
>
> Hi Ufuk,
> Thank you for launching this topic!
> I wish my latest refinement of buffer provider (
>
> https://issues.apache.org/
>
> jira/browse/FLINK-6337)  to be included in 1.3 and most of the jobs can
> get benefit from it. And I think it can be completed with the help of
>
> your
>
> reviews this week.
>
> Cheers,Zhijiang-
> -发件人：Ufuk
>
> Celebi 发送时间：2017年4月27日(星期四) 22:25收件人：dev <
> dev@flink.apache.org>抄 送：Robert Metzger 主
> 题：[DISCUSS] Feature Freeze
> Hey devs! :-)
>
> We decided to follow a time-based release model with the upcoming 1.3
> release and the planned feature freeze is on Monday, May 1st.
>
> I wanted to start a discussion to get a quick overview of the current
> state of things.
>
> - Is everyone on track and aware of the feature freeze? ;)
> - Are there any major features we want in 1.3 that
> have not been merged yet?
> - Do we need to extend the feature freeze, because of an
> important feature?
>
> Would be great to gather a list of features/PRs that we want in the
> 1.3 release. This could be a good starting point for the release
> manager (@Robert?).
>
> Best,
>
> Ufuk
>
>
>
>
>

Re: question about rowtime processfunction - are watermarks needed?

2017-04-28 Thread Fabian Hueske

Hi Radu,

yes that might happen in a parallel setup and depends on the "speed" of the
parallel threads.
An operator does only increment its own event-time clock to the minimum of
the last watermark received from each input channel.
If one input channel is "slow", the event-time of an operator lacks behind
and "late" events of the other threads are correctly processed because the
operators event-time was not incremented yet.

So, event-time is not deterministic when it comes to which records are
dropped.
The watermark documentation might be helpful as well [1].

Cheers,
Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html#watermarks-in-parallel-streams

2017-04-27 22:09 GMT+02:00 Radu Tudoran :

> Re-hi,
>
> I debuged a bit the test for the Event rowtime
>
> I tested the testBoundNonPartitionedEventTimeWindowWithRange from
> SQLITCase class
>
> Although I would expect that once a watermark is triggered: 1) the on
> timer will be called to process the events that arrived so far and 2) the
> future events that arrive will be dropped. However, it seems that almost
> the entire input can arrive in the processElement function before the
> onTimer is triggered.
>
> Moreover, if you modify the input to add an un-ordered event (see dataset
> below where I added after watermark 14000 ...an event with watermark
> 1000...as far as I would expect this should be dropped. However, in
> different runs it can happen that it will be not dropped. Basically it can
> happen that the onTimer was never triggered and this event arrives and it
> is registered). Is this correct? Am I missing something?
>
>
>@Test
>   def testBoundNonPartitionedEventTimeWindowWithRangeUnOrder(): Unit = {
> val data = Seq(
>   Left((1500L, (1L, 15, "Hello"))),
>   Left((1600L, (1L, 16, "Hello"))),
>   Left((1000L, (1L, 1, "Hello"))),
>   Left((2000L, (2L, 2, "Hello"))),
>   Right(1000L),
>   Left((2000L, (2L, 2, "Hello"))),
>   Left((2000L, (2L, 3, "Hello"))),
>   Left((3000L, (3L, 3, "Hello"))),
>   Right(2000L),
>   Left((4000L, (4L, 4, "Hello"))),
>   Right(3000L),
>   Left((5000L, (5L, 5, "Hello"))),
>   Right(5000L),
>   Left((6000L, (6L, 6, "Hello"))),
>   Left((6500L, (6L, 65, "Hello"))),
>   Right(7000L),
>   Left((9000L, (6L, 9, "Hello"))),
>   Left((9500L, (6L, 18, "Hello"))),
>   Left((9000L, (6L, 9, "Hello"))),
>   Right(1L),
>   Left((1L, (7L, 7, "Hello World"))),
>   Left((11000L, (7L, 17, "Hello World"))),
>   Left((11000L, (7L, 77, "Hello World"))),
>   Right(12000L),
>   Left((14000L, (7L, 18, "Hello World"))),
>   Right(14000L),
>   Left((15000L, (8L, 8, "Hello World"))),
>   Left((1000L, (8L, 8, "Too late - Hello World"))),   ///event is out
> of ordered and showed be droppped
>   Right(17000L),
>   Left((2L, (20L, 20, "Hello World"))),
>   Right(19000L))
>
>
>
>
> -Original Message-
> From: Fabian Hueske [mailto:fhue...@gmail.com]
> Sent: Thursday, April 27, 2017 3:17 PM
> To: dev@flink.apache.org
> Subject: Re: question about rowtime processfunction - are watermarks
> needed?
>
> Hi Radu,
>
> event-time processing requires watermarks. Operators use watermarks to
> compute the current event-time.
> The ProcessFunctions for over range windows use the TimerServices to group
> elements by time.
> In case of event-time, the timers are triggered by the event-time of the
> operator which is derived from the received watermarks.
> In case of processing-time, the timers are triggered based on the
> wallclock time of the operator.
>
> So by using event-tim timers, we implicitly rely on the watermarks because
> the timers are triggered based on the received watermarks.
>
> Best, Fabian
>
>
> 2017-04-27 10:51 GMT+02:00 Radu Tudoran :
>
> > Hi,
> >
> > I am looking at the implementation of  RowTimeBoundedRangeOver (in the
> > context of Stream SQL). I see that the logic is that the progress
> > happens based on the timestamps of the rowevent - i.e., when an even
> > arrives we register to be processed based on it's timestamp
> (ctx.timerService.
> > registerEventTimeTimer(triggeringTs))
> >
> > In the onTimer we remove (retract) data that has expired. However, we
> > do not consider watermarks nor some allowed latency for the events or
> > anything like this, which makes me ask:
> > Don't we need to work with watermarks when we deal with even time? And
> > keep the events within the allowed delayed/next watermark?  Am I
> > missing something? Or maybe we do not consider at this point
> > allowedLateness  for this version?
> >
> > Thanks
> >
> > Best regards,
> >
> >
>

[jira] [Created] (FLINK-6407) Upgrade AVRO dependency version to 1.8.x

2017-04-28 Thread Miguel (JIRA)

Miguel created FLINK-6407:
-

 Summary: Upgrade AVRO dependency version to 1.8.x
 Key: FLINK-6407
 URL: https://issues.apache.org/jira/browse/FLINK-6407
 Project: Flink
  Issue Type: Wish
  Components: Batch Connectors and Input/Output Formats
Affects Versions: 1.2.1
Reporter: Miguel
Priority: Minor


Avro 1.7.6 and 1.7.7 does not support Map that uses java Enum keys (it is 
limited to String type keys). It was solved in Avro 1.8.0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6406) Cleanup useless import

2017-04-28 Thread sunjincheng (JIRA)

sunjincheng created FLINK-6406:
--

 Summary: Cleanup useless import 
 Key: FLINK-6406
 URL: https://issues.apache.org/jira/browse/FLINK-6406
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Affects Versions: 1.3.0
Reporter: sunjincheng
Assignee: sunjincheng


When browsing the code, it is found that there are some useless reference in 
the following file which need cleanup.

*packages.scala 
*ExternalCatalogTable
*arithmetic.scala
*array.scala
*ColumnStats



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (FLINK-6420) Cleaner CEP API to specify conditions between events

[jira] [Created] (FLINK-6419) Better support for CEP quantified conditions in PatternSelect.select

Re: Idempotent Job Submission

Re: [DISCUSS] Feature Freeze

[jira] [Created] (FLINK-6417) Wildcard support for read text file

Re: [DISCUSS] Feature Freeze

Re: [DISCUSS] Feature Freeze

[jira] [Created] (FLINK-6415) Make sure Flink artifacts have no specific logger dependency

[jira] [Created] (FLINK-6414) Use scala.binary.version in place of change-scala-version.sh

[jira] [Created] (FLINK-6413) Add stream operator callback to notify about consumed network buffer

Re: [DISCUSS] Feature Freeze

Re: [DISCUSS] Feature Freeze

Re: [DISCUSS] Feature Freeze

[jira] [Created] (FLINK-6412) Stream has already been closed durign job cancel

[jira] [Created] (FLINK-6411) YarnApplicationMasterRunner should not interfere with RunningJobsRegistry

Re: question about rowtime processfunction - are watermarks needed?

Re: [DISCUSS] Feature Freeze

[jira] [Created] (FLINK-6410) build fails after changing Scala to 2.11

RE: question about rowtime processfunction - are watermarks needed?

[jira] [Created] (FLINK-6409) TUMBLE/HOP/SESSION_START/END do not resolve time field correctly

Re: Official Flink Docker images

[jira] [Created] (FLINK-6408) Repeated loading of configuration files in hadoop filesystem code paths

Re: [DISCUSS] Feature Freeze

Re: question about rowtime processfunction - are watermarks needed?

[jira] [Created] (FLINK-6407) Upgrade AVRO dependency version to 1.8.x

[jira] [Created] (FLINK-6406) Cleanup useless import

26 matches

Site Navigation

Mail list logo

Footer information