Re: Flink CEP Pattern Matching

2016-03-02 Thread Vitor Vieira
Hi Jerry,

I'm currently evaluating the CEP library too, probably doing something
similar.

Something like... comparing the 'offset' of the last event in different
time windows, each window, based on the event type, occurring like
realtime, with this same day/hour/minute a week ago/15d/1month/etc...

I plan to share some CEP examples once I finish this engine.

-@notvitor


2016-03-02 19:28 GMT-03:00 Fabian Hueske :

> Hi Jerry,
>
> I haven't used the CEP features yet, so I cannot comment on your
> requirements.
> In case you are looking for the CEP documentation, here it is:
>
> -->
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html
>
> The CEP features will be included in the upcoming 1.0.0 release (which we
> currently vote on).
> I think you would be one of the first persons to use it. Please let us
> know, if you find any problems.
>
> Thanks, Fabian
>
>
> 2016-03-02 23:12 GMT+01:00 Jerry Lam :
>
>> Hi Flink users and developers,
>>
>> I'm trying to learn the CEP library. How can I express
>> A-followBy->B-next->C where B is 5 days after A occurs. What I'm trying to
>> get a hold of is the events that matches A when I'm processing B.
>>
>> Is this supported?
>>
>> Best Regards,
>>
>> Jerry
>>
>
>


Re: Flink CEP Pattern Matching

2016-03-02 Thread Fabian Hueske
Hi Jerry,

I haven't used the CEP features yet, so I cannot comment on your
requirements.
In case you are looking for the CEP documentation, here it is:

-->
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html

The CEP features will be included in the upcoming 1.0.0 release (which we
currently vote on).
I think you would be one of the first persons to use it. Please let us
know, if you find any problems.

Thanks, Fabian


2016-03-02 23:12 GMT+01:00 Jerry Lam :

> Hi Flink users and developers,
>
> I'm trying to learn the CEP library. How can I express
> A-followBy->B-next->C where B is 5 days after A occurs. What I'm trying to
> get a hold of is the events that matches A when I'm processing B.
>
> Is this supported?
>
> Best Regards,
>
> Jerry
>


Flink CEP Pattern Matching

2016-03-02 Thread Jerry Lam
Hi Flink users and developers,

I'm trying to learn the CEP library. How can I express
A-followBy->B-next->C where B is 5 days after A occurs. What I'm trying to
get a hold of is the events that matches A when I'm processing B.

Is this supported?

Best Regards,

Jerry


Re: Flink packaging makes life hard for SBT fat jar's

2016-03-02 Thread Till Rohrmann
Hi Shikhar,

that is a problem we just found out today. The problem is that the
scala.binary.version was not properly replaced in the parent pom so that it
resolves to 2.10 [1]. Max already opened a PR to fix this problem. With the
next release candidate, this should be fixed.

[1] https://issues.apache.org/jira/browse/FLINK-3565

Cheers,
Till
​

On Wed, Mar 2, 2016 at 6:24 PM, shikhar  wrote:

> Hi Till,
>
> I just tried creating an assembly with RC4:
>
> ```
>   "org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion %
> "provided",
>   "org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion,
> ```
>
> It actually succeeds in creating the assembly now, which is great. However,
> I see that it is pulling in the Scala 2.10 version of the Kafka JAR's.
> Perhaps the correct Scala version is not specified in the published POM's
> for transitive dependencies?
>
>
> https://repository.apache.org/content/repositories/orgapacheflink-1066/org/apache/flink/flink-connector-kafka-0.8_2.11/1.0.0/flink-connector-kafka-0.8_2.11-1.0.0.pom
> refers to ${scala.binary.version} -- not sure how that is resolved
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p5249.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Flink packaging makes life hard for SBT fat jar's

2016-03-02 Thread shikhar
Hi Till,

I just tried creating an assembly with RC4:

```
  "org.apache.flink" %% "flink-clients" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided",
  "org.apache.flink" %% "flink-connector-kafka-0.8" % flinkVersion,
```

It actually succeeds in creating the assembly now, which is great. However,
I see that it is pulling in the Scala 2.10 version of the Kafka JAR's.
Perhaps the correct Scala version is not specified in the published POM's
for transitive dependencies?

https://repository.apache.org/content/repositories/orgapacheflink-1066/org/apache/flink/flink-connector-kafka-0.8_2.11/1.0.0/flink-connector-kafka-0.8_2.11-1.0.0.pom
refers to ${scala.binary.version} -- not sure how that is resolved



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-packaging-makes-life-hard-for-SBT-fat-jar-s-tp4897p5249.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Job Manager HA manual setup

2016-03-02 Thread Welly Tambunan
Hi Ufuk,

Thanks, it's working fine now with your suggestion.

Cheers

On Sun, Feb 28, 2016 at 10:10 PM, Ufuk Celebi  wrote:

> Hey Welly!
>
> Yes, it is possible to do manually via the jobmanager.sh and
> taskmanager.sh script like this:
>
> jobmanager.sh start cluster $HOST $WEB-UI-PORT
> taskmanager.sh start
>
> The start-cluster.sh script is just a wrapper around these scripts.
>
> From experience, it's often forgotten to sync the configuration files.
> Make sure to have the same configuration file on each host (both job
> and task managers), because that is relevant to parse the ZooKeeper
> quorum etc.
>
> The task managers retrieve the currently leading job manager via
> ZooKeeper. If job manager failover works as expected, but the task
> managers don't connect to the new job manager, I would suspect that
> the task manager configuration is out of sync. Could you check this
> please?
>
> Moreover, it will be helpful to have a look at the jobmanager and task
> manager logs to further investigate this. Can you share these?
> (Privately works as well of course.)
>
> – Ufuk
>
>
>
>
> On Sat, Feb 27, 2016 at 1:28 AM, Welly Tambunan  wrote:
> > typos
> >
> >
> > We have tried this one the job manager can failover, but the task manager
> > CAN'T be relocated to the new task manager. Is there some settings for
> this
> > one ? Or is the task manager also can be relocate to the new job manager
> ?
> >
> > Cheers
> >
> > On Sat, Feb 27, 2016 at 7:27 AM, Welly Tambunan 
> wrote:
> >>
> >> Hi All,
> >>
> >> We have already try to setup the Job Manager HA based on the
> documentation
> >> and using script and provided zookeeper. It works.
> >>
> >> However currently everything is done using start-cluster script that I
> >> believe will require passwordlress ssh between node. We are restricted
> with
> >> our environment so this one is not possible.
> >>
> >> Is it possible to setup the Job Manager HA manually ? By starting each
> job
> >> manager with in each node and task manager. We have our zookeeper and
> hdfs
> >> cluster already.
> >>
> >> We have tried this one the job manager can failover, but the task
> manager
> >> can be relocated to the new task manager. Is there some settings for
> this
> >> one ? Or is the task manager also can be relocate to the new job
> manager ?
> >>
> >> Any more details on the mechanism used on Job Manager HA and interaction
> >> with Zookeeper ?
> >>
> >> Is task manager also registered on Zookeeper ? How they find the right
> job
> >> manager master ?
> >>
> >>
> >> Thanks for your help.
> >>
> >> Cheers
> >> --
> >> Welly Tambunan
> >> Triplelands
> >>
> >> http://weltam.wordpress.com
> >> http://www.triplelands.com
> >
> >
> >
> >
> > --
> > Welly Tambunan
> > Triplelands
> >
> > http://weltam.wordpress.com
> > http://www.triplelands.com
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com 


Re: readTextFile is not working for StreamExecutionEnvironment

2016-03-02 Thread Maximilian Michels
Hi Balaji,

You forgot to execute your Flink program using

env.execute();

Cheers,
Max

On Wed, Mar 2, 2016 at 1:36 PM, Balaji Rajagopalan
 wrote:
> def main(args: Array[String]): Unit = {
>
>
>
>   val env: StreamExecutionEnvironment =
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   try {
>
> val data1:DataStream[String] = env.readTextFile(“somefile.txt”);
>
> data1.print()
>
>   }
>
>   catch {
>
> case e: Exception => println(e)
>   }
>
> }
>
> I have non empty file that this does not print anything.
>
>
>


Re: readTextFile is not working for StreamExecutionEnvironment

2016-03-02 Thread Robert Metzger
Hi,
you need to explicitly trigger the execution by calling "env.execute()"

On Wed, Mar 2, 2016 at 1:36 PM, Balaji Rajagopalan <
balaji.rajagopa...@olacabs.com> wrote:

> def main(args: Array[String]): Unit = {
>
>
>
>   val env: StreamExecutionEnvironment = 
> StreamExecutionEnvironment.getExecutionEnvironment();
>   env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
>   try {
>
> val data1:DataStream[String] = env.readTextFile(“somefile.txt”);
>
> data1.print()
>
>   }
>
>   catch {
>
> case e: Exception => println(e)
>   }
>
> }
>
> I have non empty file that this does not print anything.
>
>
>
>


readTextFile is not working for StreamExecutionEnvironment

2016-03-02 Thread Balaji Rajagopalan
def main(args: Array[String]): Unit = {



  val env: StreamExecutionEnvironment = 
StreamExecutionEnvironment.getExecutionEnvironment();
  env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
  try {

val data1:DataStream[String] = env.readTextFile(“somefile.txt”);
data1.print()
  }
  catch {
case e: Exception => println(e)
  }
}
I have non empty file that this does not print anything. 

 

Re: Java Maps and Type Information

2016-03-02 Thread Aljoscha Krettek
I’m afraid so, yes. I also tried out your example and it works if I add a 
.returns() after the .map() (as you did). Somehow the TypeExtractor seems to be 
acting up.

> On 02 Mar 2016, at 11:12, Simone Robutti  wrote:
> 
> >Error:(43, 87) java: no suitable method found for 
> >returns(java.lang.Class)
> method



Re: Java Maps and Type Information

2016-03-02 Thread Simone Robutti
Ok, I made it work but there's still an issue. I used
.returns(java.util.Map.class) after the "map" call and it works with this
simple function but it doesn't compile with my CustomMapFunction that
extends MapFunction. It gives a compilation error on the .returns() call.

This is the case only if the variable operator is of type CustomMapFunction
but if I do

> MapFunction operator = new CustomMapFunction();

it works again.

If I go back to

> CustomMapFunction operator = new CustomMapFunction();

it gives this error:

>Error:(43, 87) java: no suitable method found for
returns(java.lang.Class)
method

Should I open an issue?

2016-03-01 21:45 GMT+01:00 Simone Robutti :

> I tried to simplify it to the bones but I'm actually defining a custom
> MapFunction,java.util.Map> that
> even with a simple identity function fails at runtime giving me the
> following error:
>
> >Exception in thread "main"
> org.apache.flink.api.common.functions.InvalidTypesException: The return
> type of function 'main(Main.java:45)' could not be determined
> automatically, due to type erasure. You can give type information hints by
> using the returns(...) method on the result of the transformation call, or
> by letting your function implement the 'ResultTypeQueryable' interface.
>
> where the line 45 is the line where I invoke the map function.
>
> Here the piece of code:
>
> ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(2);
> Map inputMap = new HashMap();
> inputMap.put("sepal_width",2.0);
> inputMap.put("sepal_length",2.0);
> inputMap.put("petal_width",2.0);
> inputMap.put("petal_length",2.0);
>
> MapFunction operator=new
> MapFunction,Map>(){
>
> @Override
> public Map map(Map
> stringObjectMap) throws Exception {
> return stringObjectMap;
> }
> };
>
> List> input = new LinkedList<>();
> input.add(inputMap);
> DataSource> dataset =
> env.fromCollection(input);
> List> collectedResult =
> dataset.map(operator).collect();
>
>
>
>
> 2016-03-01 16:42 GMT+01:00 Aljoscha Krettek :
>
>> Hi,
>> what kind of program are you writing? I just wrote a quick example using
>> the DataStream API where I’m using Map> as
>> the output type of one of my MapFunctions.
>>
>> Cheers,
>> Aljoscha
>> > On 01 Mar 2016, at 16:33, Simone Robutti 
>> wrote:
>> >
>> > Hello,
>> >
>> > to my knowledge is not possible to use a java.util.Map for example in a
>> FlatMapFunction. Is that correct? It gives a
>> typer error at runtime and it doesn't work even with explicit
>> TypeInformation hints.
>> >
>> > Is there any way to make it work?
>> >
>> > Thanks,
>> >
>> > Simone
>>
>>
>


Re: Windows, watermarks, and late data

2016-03-02 Thread Kostas Kloudas
Hello Mike,

The code that Aljiosha mentioned is here:

https://github.com/kl0u/flink-examples/blob/master/src/main/java/com/dataartisans/flinksolo/customTriggers/EventTimeTriggerWithEarlyAndLateFiring.java
 


This allows you to specify a trigger like:

EventTimeTriggerWithEarlyAndLateFiring trigger =
EventTimeTriggerWithEarlyAndLateFiring.create()
.withEarlyFiringEvery(Time.minutes(10))
.withLateFiringEvery(Time.minutes(5))
.withAllowedLateness(Time.minutes(20))
.accumulating();

The means that it will fire every 10 minutes (in processing time) until the end 
of the window (event time), and then
every 5 minutes (processing time) for late elements up to 20 minutes late. In 
addition, previous elements are not discarded.

Hope this helps,
Kostas

> On Mar 2, 2016, at 11:02 AM, Aljoscha Krettek  wrote:
> 
> Hi,
> I did some initial work on extending the EventTimeTrigger a bit to allow more 
> complex behavior. Specifically, this allows setting an “allowed lateness” 
> after which elements should no longer lead to windows being emitted. Also, it 
> allows to specify to keep an emitted window in memory and when a late element 
> arrives emit the whole window again.
> 
> The code I have is here: 
> https://github.com/aljoscha/flink/blob/window-late/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/EventTimeTrigger.java
> 
> Kostas Kloudas worked on extending it, so maybe he could share his version of 
> the trigger as well.
> 
> Cheers,
> Aljoscha
>> On 01 Mar 2016, at 18:35, Michael Radford  wrote:
>> 
>> I'm evaluating Flink for a reporting application that will keep
>> various aggregates updated in a database. It will be consuming from
>> Kafka queues that are replicated from remote data centers, so in case
>> there is a long outage in replication, I need to decide what to do
>> about windowing and late data.
>> 
>> If I use Flink's built-in windows and watermarks, any late data will
>> be come in 1-element windows, which could overwhelm the database if a
>> large batch of late data comes in and they are each mapped to
>> individual database updates.
>> 
>> As far as I can tell, I have two options:
>> 
>> 1. Ignore late data, by marking it as late in an
>> AssignerWithPunctuatedWatermarks function, and then discarding it in a
>> flatMap operator. In this scenario, I would rely on a batch process to
>> fill in the missing data later, in the lambda architecture style.
>> 
>> 2. Implement my own watermark logic to allow full windows of late
>> data. It seems like I could, for example, emit a "tick" message that
>> is replicated to all partitions every n messages, and then a custom
>> Trigger could decide when to purge each window based on the ticks and
>> a timeout duration. The system would never emit a real Watermark.
>> 
>> My questions are:
>> - Am I mistaken about either of these, or are there any other options
>> I'm not seeing for avoiding 1-element windows?
>> - For option 2, are there any problems with not emitting actual
>> watermarks, as long as the windows are eventually purged by a trigger?
>> 
>> Thanks,
>> Mike
> 



Re: Windows, watermarks, and late data

2016-03-02 Thread Aljoscha Krettek
Hi,
I did some initial work on extending the EventTimeTrigger a bit to allow more 
complex behavior. Specifically, this allows setting an “allowed lateness” after 
which elements should no longer lead to windows being emitted. Also, it allows 
to specify to keep an emitted window in memory and when a late element arrives 
emit the whole window again.

The code I have is here: 
https://github.com/aljoscha/flink/blob/window-late/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/windowing/triggers/EventTimeTrigger.java

Kostas Kloudas worked on extending it, so maybe he could share his version of 
the trigger as well.

Cheers,
Aljoscha
> On 01 Mar 2016, at 18:35, Michael Radford  wrote:
> 
> I'm evaluating Flink for a reporting application that will keep
> various aggregates updated in a database. It will be consuming from
> Kafka queues that are replicated from remote data centers, so in case
> there is a long outage in replication, I need to decide what to do
> about windowing and late data.
> 
> If I use Flink's built-in windows and watermarks, any late data will
> be come in 1-element windows, which could overwhelm the database if a
> large batch of late data comes in and they are each mapped to
> individual database updates.
> 
> As far as I can tell, I have two options:
> 
> 1. Ignore late data, by marking it as late in an
> AssignerWithPunctuatedWatermarks function, and then discarding it in a
> flatMap operator. In this scenario, I would rely on a batch process to
> fill in the missing data later, in the lambda architecture style.
> 
> 2. Implement my own watermark logic to allow full windows of late
> data. It seems like I could, for example, emit a "tick" message that
> is replicated to all partitions every n messages, and then a custom
> Trigger could decide when to purge each window based on the ticks and
> a timeout duration. The system would never emit a real Watermark.
> 
> My questions are:
> - Am I mistaken about either of these, or are there any other options
> I'm not seeing for avoiding 1-element windows?
> - For option 2, are there any problems with not emitting actual
> watermarks, as long as the windows are eventually purged by a trigger?
> 
> Thanks,
> Mike