Re: Apache Flink Introduction Guide

2016-11-11 Thread Anchit Jatana
Hi Manish,

Appreciate the way you presented Apache Flink. While it's like an 'Intro' to
beginners, I would really encourage you to highlight/present some of the
groundbreaking features that flink offers towards stream processing like - 

-> Explicit handling of time with it's notion of 'Event time' + 'Expressive
and flexible windowing capabilities'

-> State management + Fault tolerance through it's light-weight
checkpointing/snapshotting mechanism

-> Performance - in terms of low latency, throughput and back-pressure
handling + comparative benchmarks with other engines in market.

Since, any person who reads the blog should not just take the text as a
'Hello World' to 'another new' technology in town but take it seriously as
to how this is better and how it beats the contemporary processing engines
so that the reader understands how important it is for him to delve deeper
into the topic and expand his knowledge about Flink which is going to be the
next most adopted engine in industry.

Regards,
Anchit




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Apache-Flink-Introduction-Guide-tp10041p10050.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Listening to timed-out patterns in Flink CEP

2016-11-11 Thread David Koch
Hi Till,

Excellent - I'll check out the current snapshot version! Thank you for
taking the time to look into this.

Regards,

David

On Tue, Nov 8, 2016 at 3:25 PM, Till Rohrmann  wrote:

> Hi David,
>
> sorry for my late reply. I just found time to look into the problem. You
> were right with your observation that the CEP operator did not behave as
> I've described it. The problem was that the time of the underlying NFA was
> not advanced if there were no events buffered in the CEP operator when a
> new watermark arrived. This was not intended and I opened a PR [1] to fix
> this problem. I've tested the fix with your example program and it seems to
> solve the problem that you don't see timeouts after the timeout interval
> has passed. Thanks for reporting this problem and please excuse my long
> response time.
>
> Btw, I'll merge the PR this evening. So it should be included in the
> current snapshot version by the end of tomorrow.
>
> [1] https://github.com/apache/flink/pull/2771
>
> Cheers,
> Till
>
> On Fri, Oct 14, 2016 at 11:40 AM, Till Rohrmann 
> wrote:
>
>> Hi guys,
>>
>> I'll try to come up with an example illustrating the behaviour over the
>> weekend.
>>
>> Cheers,
>> Till
>>
>> On Fri, Oct 14, 2016 at 11:16 AM, David Koch 
>> wrote:
>>
>>> Hello,
>>>
>>> Thanks for the code Sameer. Unfortunately, it didn't solve the issue.
>>> Compared to what I did the principle is the same - make sure that the
>>> watermark advances even without events present to trigger timeouts in CEP
>>> patterns.
>>>
>>> If Till or anyone else could provide a minimal example illustrating the
>>> supposed behaviour of:
>>>
>>> [CEP] timeout will be detected when the first watermark exceeding the
 timeout value is received
>>>
>>>
>>> I'd very much appreciate it.
>>>
>>> Regards,
>>>
>>> David
>>>
>>>
>>> On Wed, Oct 12, 2016 at 1:54 AM, Sameer W  wrote:
>>>
 Try this. Your WM's need to move forward. Also don't use System
 Timestamp. Use the timestamp of the element seen as the reference as the
 elements are most likely lagging the system timestamp.

 DataStream withTimestampsAndWatermarks = tuples
 .assignTimestampsAndWatermarks(new
 AssignerWithPeriodicWatermarks() {

 long waterMarkTmst;
 long lastEmittedWM=0;
 @Override
 public long extractTimestamp(Event element, long
 previousElementTimestamp) {
 if(element.tmst>lastEmittedWM){
waterMarkTmst = element.tmst-1; //Assumes
 increasing timestamps. Need to subtract 1 as more elements with same TS
 might arrive
 }
 return element.tmst;
 }

 @Override
 public Watermark getCurrentWatermark() {
 if(lastEmittedWM==waterMarkTmst){ //No new event seen,
 move the WM forward by auto watermark interval
 waterMarkTmst = waterMarkTmst + 1000l//Increase by
 auto watermark interval (Watermarks only move forward in time)
 }
 lastEmittedWM = waterMarkTmst

 System.out.println(String.format("Watermark at %s",
 new Date(waterMarkTmst)));
 return new Watermark(waterMarkTmst);//Until an event
 is seem WM==0 starts advancing by 1000ms until an event is seen
 }
 }).keyBy("key");

 On Tue, Oct 11, 2016 at 7:29 PM, David Koch 
 wrote:

> Hello,
>
> I tried setting the watermark to System.currentTimeMillis() - 5000L,
> event timestamps are System.currentTimeMillis(). I do not observe the
> expected behaviour of the PatternTimeoutFunction firing once the watermark
> moves past the timeout "anchored" by a pattern match.
>
> Here is the complete test class source ,
> in case someone is interested. The timestamp/watermark assigner looks like
> this:
>
> DataStream withTimestampsAndWatermarks = tuples
> .assignTimestampsAndWatermarks(new
> AssignerWithPeriodicWatermarks() {
>
> long waterMarkTmst;
>
> @Override
> public long extractTimestamp(Event element, long
> previousElementTimestamp) {
> return element.tmst;
> }
>
> @Override
> public Watermark getCurrentWatermark() {
> waterMarkTmst = System.currentTimeMillis() - 5000L;
> System.out.println(String.format("Watermark at %s",
> new Date(waterMarkTmst)));
> return new Watermark(waterMarkTmst);
> }
> }).keyBy("key");
>
> withTimestampsAndWatermarks.getExecutionConfig().setAutoWate
> rmarkInterval(1000L);
>
> // Apply pattern filtering on stream.
> PatternStream patternStream = 

Re: TaskManager log thread

2016-11-11 Thread CPC
Hi Dominik,
It logs to taskmanager log. But if you are using localruntime via ide or
localcluster it is not logging them. If you start jobmanager and
taskmanager separately then you can see logs.

On Nov 11, 2016 23:02, "Dominik Safaric"  wrote:

> If taskmanager.debug.memory.startLogThread is set to true, where does the
> task manager output the logs to?
>
> Unfortunately I couldn’t find this information in the documentation, hence
> the question.
>
> Thanks in advance,
> Dominik


Flink - Nifi Connectors - Class not found

2016-11-11 Thread PACE, JAMES
I am running Apache Flink 1.1.3 - Hadoop version 1.2.1 with the NiFi connector. 
 When I run a program with a single NiFi Source, I receive the following Stack 
trace in the logs:



2016-11-11 19:28:25,661 WARN  org.apache.flink.client.CliFrontend

- Unable to locate custom CLI class 
org.apache.flink.yarn.cli.FlinkYarnSessionCli. Flink is not compiled with 
support for this class.

java.lang.ClassNotFoundException: org.apache.flink.yarn.cli.FlinkYarnSessionCli

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:195)

   at 
org.apache.flink.client.CliFrontend.loadCustomCommandLine(CliFrontend.java:1136)

at org.apache.flink.client.CliFrontend.(CliFrontend.java:128)

2016-11-11 19:28:25,855 INFO  org.apache.flink.client.CliFrontend

2016-11-11 19:28:25,855 INFO  org.apache.flink.client.CliFrontend

-  Starting Command Line Client (Version: 1.1.3, Rev:8e8d454, 
Date:10.10.2016 @ 13:26:32 UTC)

2016-11-11 19:28:25,855 INFO  org.apache.flink.client.CliFrontend

-  Current user: x

2016-11-11 19:28:25,856 INFO  org.apache.flink.client.CliFrontend

-  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 
1.7/24.80-b11

2016-11-11 19:28:25,856 INFO  org.apache.flink.client.CliFrontend

-  Maximum heap size: 3545 MiBytes

2016-11-11 19:28:25,866 INFO  org.apache.flink.client.CliFrontend

-  Hadoop version: 1.2.1



Seems like the Nifi connector requires the yarn enabled version of flink?  Is 
there a dependency I can add to get over this hurdle?



Thanks



Jim



Re: Programmatically abort checkpoint

2016-11-11 Thread Lorenzo Affetti
Yes, I mean aborting the checkpoint alignment directly from an operator.
This is because I am introducing an operator that performs some asynchronous 
stuff that also involves side effects on its internal state.
I wanted to abort a checkpoint directly from that operator if a barrier comes 
in while I’m performing that asynchronous job.

Is the only way to use your code in master and send a DeclineCheckpoint message?

Thank you,

Lorenzo Affetti


On 11 Nov 2016, at 19:33, Stephan Ewen 
mailto:se...@apache.org>> wrote:

What do you mean exactly with aborting a checkpoint? Continuing the processing 
despite failed checkpoints?

You can have a look at these recent changes, they cleanly abort checkpoint 
alignment in certain conditions:

https://issues.apache.org/jira/browse/FLINK-4976
https://github.com/apache/flink/pull/2754

Best,
Stephan


On Fri, Nov 11, 2016 at 5:15 PM, Lorenzo Affetti 
mailto:lorenzo.affe...@polimi.it>> wrote:
Hi everybody, I am using Flink v1.1.2

is it possible to programmatically abort a snapshot from the method

 public StreamTaskState snapshotOperatorState(long checkpointId, long 
timestamp)

In an operator?

Thank you!

Lorenzo






TaskManager log thread

2016-11-11 Thread Dominik Safaric
If taskmanager.debug.memory.startLogThread is set to true, where does the task 
manager output the logs to? 

Unfortunately I couldn’t find this information in the documentation, hence the 
question.

Thanks in advance,
Dominik

Order by which windows are processed on event time

2016-11-11 Thread Saiph Kappa
Hi,

I have a streaming application based on event time. When I issue a
watermark that will close more than 1 window (and trigger their
processment), I can see that windows are computed sequentially (at least
using local machine) and that the computing order is not defined. Can I
change this behavior to make it compute from the oldest to the newest
window? Where in the flink code can I find this that handles the scheduling
of window processment?

Thank you.


Re: Programmatically abort checkpoint

2016-11-11 Thread Stephan Ewen
What do you mean exactly with aborting a checkpoint? Continuing the
processing despite failed checkpoints?

You can have a look at these recent changes, they cleanly abort checkpoint
alignment in certain conditions:

https://issues.apache.org/jira/browse/FLINK-4976
https://github.com/apache/flink/pull/2754

Best,
Stephan


On Fri, Nov 11, 2016 at 5:15 PM, Lorenzo Affetti 
wrote:

> Hi everybody, I am using Flink v1.1.2
>
> is it possible to programmatically abort a snapshot from the method
>
>  public StreamTaskState snapshotOperatorState(long checkpointId,
> long timestamp)
>
> In an operator?
>
> Thank you!
>
> Lorenzo
>
>
>


Re: Window PURGE Behaviour 1.0.2 vs 1.1.3

2016-11-11 Thread Aljoscha Krettek
Hi,
I think the jar files in the lib folder are checked first so shipping the
WindowOperator with the job should not work.

Cheers,
Aljoscha

On Thu, 10 Nov 2016 at 17:48 Konstantin Knauf 
wrote:

> Hi Aljoscha,
>
> alright, for the time being I have modified the WindowOperator and built
> flink-streaming-java for our team. When you only change the
> WindowOperator class, is it safe to just bundle it with the job? I.e.
> does this class have precedence over the class in the binary bundle of
> flink?
>
> Cheers,
>
> Konstantin
>
> On 10.11.2016 14:48, Aljoscha Krettek wrote:
> > Hi,
> > there were some discussions on the ML and it seems that the consensus is
> > to aim for a release this year.
> >
> > Let me think a bit more and get back to you on the other issues.
> >
> > Cheers,
> > Aljoscha
> >
> > On Thu, 10 Nov 2016 at 11:47 Konstantin Knauf
> > mailto:konstantin.kn...@tngtech.com>>
> wrote:
> >
> > Hi Aljoscha,
> >
> > unfortunately, I think, FLINK-4994 would not solve our issue. What
> does
> > "on the very end" mean in case of a GlobalWindow?
> >
> > FLINK-4369 would fix my workaround though. Is there already a
> timeline
> > for Flink 1.2?
> >
> > Cheers,
> >
> >
> > Konst
> >
> > On 10.11.2016 10:19, Aljoscha Krettek wrote:
> > > Hi Konstantin,
> > > evicting elements not being evicted is a bug that should be fixed
> for
> > > Flink 1.2: https://issues.apache.org/jira/browse/FLINK-4369.
> > >
> > > The check about non-existing window state when triggering was
> > introduced
> > > because otherwise a Trigger could return FIRE and then there would
> be
> > > nothing to fire. I guess if we did indeed fire the trigger even
> with
> > > non-existing state then some people might wonder why no emission is
> > > triggered when their trigger returns FIRE. I see your point
> > though, that
> > > the omitted firing is problematic for some cases.
> > >
> > > I think having clear() as proposed
> > > in https://issues.apache.org/jira/browse/FLINK-4994 would solve
> your
> > > case. You were using your own cleanup timer as a workaround because
> > > clear() is currently also called on PURGE. With clear() only being
> > > called at the very end this should work, correct?
> > >
> > > Cheers,
> > > Aljoscha
> > >
> > > On Wed, 9 Nov 2016 at 19:53 Konstantin Knauf
> > >  > 
> >  > >> wrote:
> > >
> > > Hi Aljoscha,
> > >
> > > as it turns out the "workaround" I was thinking was
> functionally
> > > working, but had a so to say memory leak. I was under the
> > impression
> > > that evicted elements will be removed from the window state...
> > >
> > > Anyway, I think that this (triggers not being evaluated when
> > the window
> > > state is null) turns out to be blocker for us.
> > >
> > > Why is this check done? Since a user can do basically whatever
> > she likes
> > > in onProcessingTimeTimer() the comment
> > >
> > > // if we have no state, there is nothing to do
> > >
> > > is, well, just not true in some cases (e.g. state updates in
> > our case).
> > >
> > > Cheers,
> > >
> > > Konstantin
> > >
> > > On 09.11.2016 14:17, Aljoscha Krettek wrote:
> > > > Could you go into some detail of why you need to keep the
> > trigger
> > > state?
> > > >
> > > > Just the basics because you probably cannot (should not) talk
> > > about your
> > > > internal stuff.
> > > >
> > > > On Wed, 9 Nov 2016 at 13:16 Konstantin Knauf
> > > >  > 
> > >  > >
> > >  > 
> > >  >  > > >
> > > > Sounds good Aljoscha.
> > > >
> > > > sent from my phone. Plz excuse brevity and tpyos.
> > > > ---
> > > > Konstantin Knauf *konstantin.kn...@tngtech.com
> > 
> > >  > >
> > > >  > 
> > >  > >> * +49-174-3413182
> <0174%203413182>
> > 
> > > 
> > > > 
> > > >
> > > > TNG Technology Consulting GmbH, Betastr. 13a, 85774
> > Unterföhring
> > > > G

Apache Flink Introduction Guide

2016-11-11 Thread Manish Shukla
Hi Guys,

I have contributed a blog post on Apache Flink introduction, architecture
and execution model. Please provide the feedback:
http://data-flair.training/blogs/apache-flink-comprehensive-guide-tutorial-for-beginners/

I hope I have covered all the concepts correctly.


-Malini


Flink work with raw S3 (S3FileSystem or other), not a HDFS backed by S3 (S3AFileSystem, NativeS3FileSystem)?

2016-11-11 Thread Steve Morin
Use-case: I am trying to see how to use flink with s3, where we use our own
client libraries or things like AWS firehose to put data into S3, then
process it in batch using flink.  This clients are putting data into S3
with out HDFS - Aka we aren't using HDFS on top of S3.

Most of what I can find referenced [1] is using HDFS backed by S3 (
S3AFileSystem, NativeS3FileSystem)

I find one reference [2] that using S3 Filesystem (S3FileSystem) doesn't wo
rk.

Can anyone with Flink experience help give any insight on this?

References:

   - [1] -
   https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/aws.html
   - [2] -
   http://stackoverflow.com/questions/32959790/run-apache-flink-with-amazon-s3


-- 
*Steve Morin | Managing Partner - CTO*

*Nvent*

O 800-407-1156 ext 803 <800-407-1156;803> | M 347-453-5579

smo...@nventdata.com  

*Enabling the Data Driven Enterprise*
*(Ask us how we can setup scalable open source realtime billion+ event/data
collection/analytics infrastructure in weeks)*

Service Areas: Management & Strategy Consulting | Data Engineering | Data
Science & Visualization


Programmatically abort checkpoint

2016-11-11 Thread Lorenzo Affetti
Hi everybody, I am using Flink v1.1.2

is it possible to programmatically abort a snapshot from the method

 public StreamTaskState snapshotOperatorState(long checkpointId, long 
timestamp)

In an operator?

Thank you!

Lorenzo




WindowOperator - element's timestamp

2016-11-11 Thread Petr Novotnik

Hello,

I'm struggling to understand the following behaviour of the 
`WindowOperator` and would appreciate some insight from experts:


In particular I'm thinking about the following hypothetical data flow:

input.keyBy(..)
 .window(TumblingEventTimeWindows.of(..))
 .apply(..)
 ...
 .keyBy(..)
 .window(CustomWindowAssignerUsingATriggerBasedOnTheElementsStamp)
 .apply(..)

When the first window operator fires a window based on the timer, the 
emitted elements are assigned a timestamp which equals 
`window.maxTimestamp()`. This stamp is then available in the second 
window operator's trigger through the `onElement` method. So far so good.


However, when using `ContinuousEventTimeTrigger` (simply put when firing 
the window multiple times at different times in its lifecycle) in the 
first window operator, _all_ of the elements of this window - no matter 
whether fired as a partial or the final window result - will arrive with 
the same stamp in the (downstream) operators.


This make it practically impossible to use again 
`ContinuousEventTimeTrigger` (or similar) in the second window operator 
to achieve "early firing" again.


This is surprising. I would expect the elements to be assigned the stamp 
of the timer which fired them (which will be window#maxTimestamp() for 
`TumblingEventTimeWindows`). Is there any particular reason for the 
unconditional assignment to `window.maxTimestamp()`?


Many thanks in advance,
P.


Flink Avro Kafka Reading/Writing

2016-11-11 Thread daviD
Hi All,
Does anyone know if Flink can read and write Avro schema to Kafka?
Thanks
daviD

Kafka Stream to Database batch inserts

2016-11-11 Thread criss
Hello,

I'm new to Flink and I need some advicees regarding the best approach to do
the following:
- read some items from a Kafka topic
- on Flink stream side, after some simple filtering steps, group these items
in batches by flink processing time.
- insert the items in a PostgreSql database using a batch insert.

I did this by using a time window of 1 second and added a custom sink which
collects items in a blocking queue. Additionally I need to have a separate
thread which triggers the commit to the database after some time, smaller
than window's time.

The solution works, but i am not very pleased with it because it looks very
complicated for a simple batching items task.

Is there any way to trigger the commit directly when the window is closed? I
didn't find any solution to get notified when the window is completed. I
would like to get rid of this separate thread only for triggering the batch
insert.

Any other possible solution would be highly appreciated. :)
Thanks



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-Stream-to-Database-batch-inserts-tp10036.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink - Exception Handling best practices

2016-11-11 Thread Fabian Hueske
Hi Mich,

at the moment there is not much support handle such data driven exceptions
(badly formatted data, late data, ...).
However, there is a proposal to improve this: FLIP-13 [1]. So it is work in
progress.

It would be very helpful if you could check if the proposal would address
your use cases.

Until then, I guess that either the split operator or directly writing to
an external storage system would be the way to go.

Best,
Fabian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-13+Side+Outputs+in+Flink

2016-11-11 2:33 GMT+01:00 Michel Betancourt :

>
> Hi, new to Apache Flink.  Trying to find some solid input on how best to
> handle exceptions in streams -- specifically those that should not
> interrupt the stream.
>
> For example, if an error occurs during deserialization from bytes/Strings
> to your data-type, in my use-case I would rather queue the data for visual
> inspection than discard it and filter it out.
>
> One way of doing this is to diverge the stream so that good items take one
> path, while bad items take another.
>
> The closest thing I can find in Flink that can achieve this effect is the
> split operator. The caveat is that split does not also allow for inlined
> transformations.  In other words, the best use of split appears first
> perform your logic that catches the exception.  Then pass the exception
> into the next stage which uses split to check for an exception and
> providing names to each piece of the decision, for example "OK" vs "error".
>
> Frameworks like RX (Reactive Extensions, eg RxJava) have built in
> functionality that allows the user to decide if they want to handle
> exceptions globally or specifically and resume if needed.  I was hoping to
> find similar operations in Flink but so far no luck.
>
> At any rate, it would be great to get some feedback to see if I am heading
> down the good path here, and whether there are any caveats / gotchas to be
> aware of?
>
> Thanks!
> Mich
>
>