Cannot used managed keyed state in sink

2018-02-23 Thread Kien Truong
Hi,

It seems that I can't used managed keyed state inside sink functions. Is this 
unsupported with Flink 1.4 or am I doing something wrong ?

Regards,
Kien

⁣Sent from TypeApp ​

Implementing CountWindow in Window Join and continuous joining for 2 datastreams

2018-02-23 Thread Tay Zhen Shen
Hi , I'm currently working on Flink with a simple stock market 
analysis.Basically i need to have the sum of 100 element (Count Window)(sliding 
size : 10) and  also sum of 20 element(Count Window) (sliding size: 10) 
respectively. I realised that i have to calculate the both sum on 2 different 
stream respectively and so i did. Now i have 2 streams 1 containing the sum of 
100 element and the other one containing the sum of 20 element. Now i wanted to 
join the both datastream into 1 datastream. I'm using join and i realised that 
it can only supports Time Windows and Tumbling Window? Is there any other 
functions that i can use to solve my problem?


Example:

records:

date,price1,price2

date,price1,price2


sum to become:

date,sum_of_price1_for_100_element,sum_of_price2_for_20_element)


CEP: watermark not generated when using watermark

2018-02-23 Thread hanhonggen
hi,
 I was using a user case of cep on flink 1.3.2, as follow:
1 source from kafka configured with 128 partitions
2 data schema: logTime long, impressionId string, orderPlanId long, type
int.  If two type(click and impression) with the same impressionId and
orderPlanId were matched in 30 seconds, output the result to kafka.
3 input datas were out of order
4 using event time
5 using BoundedOutOfOrdernessTimestampExtractor to generate watermark.

When i deploy the app using with -ys 2 -yn 4, it worked well. But when -yn
larger than 4, watermark was not generated, and eventually the buffer was
destroyed.

StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
logger.error("default watermark interval {}",
env.getConfig().getAutoWatermarkInterval());

final Properties consumerProperties = new Properties();
consumerProperties.setProperty("bootstrap.servers", srcBroker);
consumerProperties.setProperty("zookeeper.connect", srcZk);
consumerProperties.setProperty("group.id", srcGroup);
FlinkKafkaConsumer08 kafkaSource = new
FlinkKafkaConsumer08(srcTopic, new SimpleStringSchema(),
consumerProperties);
kafkaSource.assignTimestampsAndWatermarks(new
BoundedOutOfOrdernessTimestampExtractor(Time.seconds(60)) {
@Override
public long extractTimestamp(String s) {
JSONObject json = JSONObject.parseObject(s);
long logTime = json.getLongValue("logTime") * 1000;
return logTime;
}
});
DataStreamSource inputStream = env.addSource(kafkaSource);

DataStream originStream = inputStream.keyBy(new
KeySelector>() {
@Override
public Tuple2 getKey(String s) throws Exception {
JSONObject json = JSONObject.parseObject(s);
String impressionId = json.getString("impressionId");
Long orderPlanId = json.getLongValue("orderPlanId");
return new Tuple2(impressionId, orderPlanId);
}
});

Pattern pattern =
Pattern.begin("start").where(new SimpleCondition() {
@Override
public boolean filter(String s) throws Exception {
JSONObject json = JSONObject.parseObject(s);
int type = json.getInteger("type");
return type == 0;
}
}).followedByAny("end").where(new SimpleCondition() {
@Override
public boolean filter(String s) throws Exception {
JSONObject json = JSONObject.parseObject(s);
int type = json.getInteger("type");
return type == 1;
}
}).within(Time.seconds(30));

PatternStream cepStream = CEP.pattern(originStream,
pattern);
DataStream outputStream = cepStream.flatSelect(new
PatternFlatSelectFunction() {
@Override
public void flatSelect(Map> map,
Collector collector) throws Exception {
List impInfos = map.get("start");
List clkInfos = map.get("end");
logger.error("flatSelect start size {}, end size {}",
map.size(), clkInfos.size());

for (String clkInfo : clkInfos) {
for (String impInfo : impInfos) {
JSONObject clkJson =
JSONObject.parseObject(clkInfo);
JSONObject impJson =
JSONObject.parseObject(impInfo);
String clkImpressionId =
clkJson.getString("impressionId");
Long clkOrderPlanId =
clkJson.getLong("orderPlanId");
String impImpressionId =
impJson.getString("impressionId");
Long impOrderPlanId =
impJson.getLong("orderPlanId");
logger.error("start size {}, end size {}, impression
{}, orderPlan {}, impTime {}, clkTime {}",
impInfos.size(), clkInfos.size(),
clkImpressionId, clkOrderPlanId,
impJson.getLong("logTime"),
clkJson.getLong("logTime"));
if (StringUtils.equals(clkImpressionId,
impImpressionId) && clkOrderPlanId.equals(impOrderPlanId)) {
StringBuilder builder = new StringBuilder();
   
builder.append("impressionId:").append(clkImpressionId).append(",")
   
.append("orderPlanId:").append(clkOrderPlanId).append(",")
   
.append("clkTime:").append(clkJson.getLong("logTime")).append(",")
   
.append("impTime:").append(impJson.getLong("logTime"));
collector.collect(builder.toString());
}
}
}
  

Re: Is Flink easy to deploy ?

2018-02-23 Thread Niclas Hedhman
I think you simply missing a bunch of the Flink artifacts.

Flink is broken into dozens of pieces, and you need to select from a large
set of artifacts what you need to depend on.

Typically, there is one Flink artifact per "extension".
I ended up with
   flink-core
   flink-core
   flink-connector-cassandra_2.11
   flink-connector-kafka_2.11
   flink-queryable-state-runtime_2.11
   flink-streaming-java_2.11
   flink-streaming-scala_2.11

With transitive dependencies enabled, meaning whatever Flink depends on is
also included

Cheers

On Sat, Feb 24, 2018 at 3:22 AM, Esa Heikkinen 
wrote:

>
> Yes i have looked. For example, if i want to compile and run
> StreamTableExample.scala from:
>
> https://github.com/apache/flink/blob/master/flink-
> examples/flink-examples-table/src/main/scala/org/apache/
> flink/table/examples/scala/StreamTableExample.scala
>
> I have taken all examples (and also latest Flink at same time) to my Linux
> from git.
>
> Where directory should i be in for compiling and running
> StreamTableExample in command line ? flink-examples-table ?
>
> What is the command for compiling ? mvn clean install -Pbuild-jar ?
>
> What is the command running the StreamTableExample ? / bin>/flink run -c org.apache.flink.table.examples.scala.StreamTableExample
> target/flink-examples-table_2.1-1.5-SNAPSHOT.jar ?
>
> This does not work because of error: java.lang.NoClassDefFoundError:...
>
> BR Esa
>
> Fabian Hueske kirjoitti 23.2.2018 klo 15:07:
>
> Have you had a look at the examples? [1]
> They can be run out of the IDE.
>
> Fabian
>
> [1] https://github.com/apache/flink/tree/master/flink-
> examples/flink-examples-streaming/src/main/scala/org/
> apache/flink/streaming/scala/examples
>
> 2018-02-23 13:30 GMT+01:00 Esa Heikkinen :
>
>> I have lot of difficulties to deploy Flink. That is maybe because I am
>> new with Flink and its (Java and Maven) development environment, but I
>> would hear the opinions of others. I would like to use Scala.
>>
>>
>>
>> There are many examples, but often there are missing “imports” and
>> settings in pom.xml. It seem to be very hard to job to find correct ones.
>> Maybe use of IDE (IntelliJ IDEA) is almost mandatory and it helps to find
>> “imports”, but it does not find all of them.
>>
>>
>>
>> Generally you have to do and study a lot of basic work before you get
>> into the actual thing ?
>>
>>
>>
>> If there is a ready example (with source code) that is enough close to
>> what you want, it is much easier to deploy. But if not, it can be a
>> surprisingly difficult and time-consuming task. Because the
>> documentation seem to be partially incomplete, it is often necessary to
>> “google” and query the mailing list.
>>
>>
>>
>> Or have I misunderstand something or I can not use Flink correctly (yet) ?
>>
>>
>>
>> Features of Flink are so good that I would want to learn to use it.
>>
>>
>>
>> Best Regards
>>
>> Esa
>>
>
>
>


-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java


Re: Machine Learning: Flink and MOA

2018-02-23 Thread Theodore Vasiloudis
Hello Christophe,

That's very interesting, I've been working with MOA/SAMOA recently and was
considering if we could create some
easy integration with Flink.

I have a Master student this year that could do some work on this,
hopefully we can create something interesting
there.

Regards,
Theodore

On Wed, Feb 21, 2018 at 7:38 PM, Christophe Salperwyck <
christophe.salperw...@gmail.com> wrote:

> Hi guys,
>
> I know there is FlinkML to do some machine learning with Flink but it
> works on DataSet and not on DataStream, there is also SAMOA which can run
> on Flink but I find it a bit too complicated.
>
> I wanted to see if it would be easy to plug directly MOA on Flink and
> tried to present it in the DataKRK meetup, but I didn't have time at the
> end of the presentation... Nevertheless I spent a bit of time plugging
> Flink and MOA and I thought it might be worth sharing it in case it would
> be interesting for someone. I also take this opportunity to get some
> feedback on it from people in the Flink community if they have a bit of
> time to review it.
>
> Here is the code:
> https://github.com/csalperwyck/moa-flink-ozabag-example
> https://github.com/csalperwyck/moa-flink-traintest-example
>
> Many Flink methods were very convenient to plug these 2 tools :-)
>
> Keep the good work!
>
> Cheers,
> Christophe
> PS: if some people are in bigdatatechwarsaw and interested, we can discuss
> tomorrow :-)
>


Which test cluster to use for checkpointing tests?

2018-02-23 Thread Ken Krugler
Hi all,

For testing checkpointing, is it possible to use LocalFlinkMiniCluster?

Asking because I’m not seeing checkpoint calls being made to my custom function 
(implements ListCheckpointed) when I’m running with LocalFlinkMiniCluster.

Though I do see entries like this logged:

18/02/23 12:40:50 INFO jobmanager.JobManager:246 - Using application-defined 
state backend for checkpoint/savepoint metadata: MemoryStateBackend (data in 
heap memory / checkpoints to JobManager).
18/02/23 12:40:50 INFO checkpoint.CheckpointCoordinator:525 - Checkpoint 
triggering task Source: Seed urls source (1/2) is not being executed at the 
moment. Aborting checkpoint.

But when I browse the Flink source, tests for checkpointing seem to be using 
TestCluster, e.g. in ResumeCheckpointManuallyITCase

Thanks,

— Ken


http://about.me/kkrugler
+1 530-210-6378



Re: Is Flink easy to deploy ?

2018-02-23 Thread Esa Heikkinen


Yes i have looked. For example, if i want to compile and run 
StreamTableExample.scala from:


https://github.com/apache/flink/blob/master/flink-examples/flink-examples-table/src/main/scala/org/apache/flink/table/examples/scala/StreamTableExample.scala

I have taken all examples (and also latest Flink at same time) to my 
Linux from git.


Where directory should i be in for compiling and running 
StreamTableExample in command line ? flink-examples-table ?


What is the command for compiling ? mvn clean install -Pbuild-jar ?

What is the command running the StreamTableExample ? /bin>/flink run -c 
org.apache.flink.table.examples.scala.StreamTableExample 
target/flink-examples-table_2.1-1.5-SNAPSHOT.jar ?


This does not work because of error: java.lang.NoClassDefFoundError:...

BR Esa


Fabian Hueske kirjoitti 23.2.2018 klo 15:07:

Have you had a look at the examples? [1]
They can be run out of the IDE.

Fabian

[1] 
https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples


2018-02-23 13:30 GMT+01:00 Esa Heikkinen >:


I have lot of difficulties to deploy Flink. That is maybe because
I am new with Flink and its (Java and Maven) development
environment, but Iwould hear the opinions of others. I would like
to use Scala.

There are many examples, but often there are missing “imports” and
settings in pom.xml. It seem to be very hard to job to find
correct ones. Maybe use of IDE (IntelliJ IDEA) is almost mandatory
and it helps to find “imports”, but it does not find all of them.

Generally you have to do and study a lot of basic work before you
get into the actual thing ?

If there is a ready example (with source code) that is enough
close to what you want, it is much easier to deploy. But if not,
it can be a surprisingly difficult and time-consuming task.
Because the documentation seem to be partially incomplete, it is
often necessary to “google” and query the mailing list.

Or have I misunderstand something or I can not use Flink correctly
(yet) ?

Features of Flink are so good that I would want to learn to use it.

Best Regards

Esa






Re: Timestamp from Kafka record and watermark generation

2018-02-23 Thread Federico D'Ambrosio
Thank you very much Aljoscha!

2018-02-23 14:45 GMT+01:00 Aljoscha Krettek :

> Hi,
>
> This is a know issue: https://issues.apache.org/jira/browse/FLINK-8500.
> And yes, the workaround is to write an assigner from scratch but you can
> start by copying the code of AscendingTimestampExtractor.
>
> Sorry for the inconvenience.
>
> --
> Aljoscha
>
> On 22. Feb 2018, at 12:05, Federico D'Ambrosio  wrote:
>
> Hello everyone,
>
> I'm consuming from a Kafka topic, on which I'm writing with a
> FlinkKafkaProducer, with the timestamp relative flag set to true.
>
> From what I gather from the documentation [1], Flink is aware of Kafka
> Record's timestamp and only the watermark should be set with an appropriate
> TimestampExtractor, still I'm failing to understand how to implement it in
> the right way.
>
> I thought that it would be possible to use the already existent
> AscendingTimestampExtractor, overriding the extractTimestamp method, but
> it's marked final.
>
> new FlinkKafkaConsumer010[Event](ingestion_topic, new 
> JSONDeserializationSchema(), consumerConfig)
>   .setStartFromLatest()
> .assignTimestampsAndWatermarks(new AscendingTimestampExtractor[Event]() {
>   def extractAscendingTimestamp(element: Event): Long = ???
> })
>
> Should I need to implement my own TimestampExtractor (with the appropriate
> getCurrentWatermark and extractTimestamp methods) ?
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> release-1.4/dev/connectors/kafka.html#using-kafka-
> timestamps-and-flink-event-time-in-kafka-010
>
> Thank you,
> Federico
>
>
>


-- 
Federico D'Ambrosio


Re: Task manager not able to rejoin job manager after network hicup

2018-02-23 Thread jelmer
We found out there's a taskmanager.exit-on-fatal-akka-error property that
will restart flink in this situation but it is not enabled by default and
that feels like a rather blunt tool. I expect systems like this to be more
resilient to this

On 23 February 2018 at 14:42, Aljoscha Krettek  wrote:

> @Till Is this the expected behaviour or do you suspect something could be
> going wrong?
>
>
> On 23. Feb 2018, at 08:59, jelmer  wrote:
>
> We've observed on our flink 1.4.0 setup that if for some reason the
> networking between the task manager and the job manager gets disrupted then
> the task manager is never able to reconnect.
>
> You'll end up with messages like this getting printed to the log repeatedly
>
> Trying to register at JobManager 
> akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 17, timeout: 3 
> milliseconds)
> Quarantined address [akka.tcp://flink@jobmanager:6123] is still unreachable 
> or has not been restarted. Keeping it quarantined.
>
>
> Or alternatively
>
>
> Tried to associate with unreachable remote address 
> [akka.tcp://flink@jobmanager:6123]. Address is now gated for 5000 ms, all 
> messages to this address will be delivered to dead letters. Reason: [The 
> remote system has quarantined this system. No further associations to the 
> remote system are possible until this system is restarted.
>
>
> But it never recovers until you either restart the job manager or the task
> manager
>
> I was able to successfully reproduce this behaviour in two docker
> containers here :
>
> https://github.com/jelmerk/flink-worker-not-rejoining
>
> Has anyone else seen this problem ?
>
>
>
>
>
>
>
>
>


Re: Problem when uploading a java flink program to aws lambda

2018-02-23 Thread Aljoscha Krettek
Could you please post your build setup. Pom file? And maybe also the contents 
of your Jar file.

Best,
Aljoscha

> On 22. Feb 2018, at 10:39, Kulasangar  
> wrote:
> 
> I have created a java application using flink api and table api. I can
> provide the source code if needed.
> 
> The application works perfectly locally. But when I tried to upload the
> created jar in aws lambda and execute it I'm being thrown with the following
> error:
> 
> *reference.conf: 804: Could not resolve substitution to a value:
> ${akka.stream.materializer}:
> com.typesafe.config.ConfigException$UnresolvedSubstitution
> com.typesafe.config.ConfigException$UnresolvedSubstitution: reference.conf:
> 804: Could not resolve substitution to a value: ${akka.stream.materializer}
> at
> com.typesafe.config.impl.ConfigReference.resolveSubstitutions(ConfigReference.java:87)
> at
> com.typesafe.config.impl.ResolveSource.resolveCheckingReplacement(ResolveSource.java:110)*
> 
> I have posted only the first few lines of the exception. Has anyone come
> across this typa error? 
> 
> Any help would be appreciated.
> 
> Regards.
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



Re: Timestamp from Kafka record and watermark generation

2018-02-23 Thread Aljoscha Krettek
Hi,

This is a know issue: https://issues.apache.org/jira/browse/FLINK-8500 
. And yes, the workaround is 
to write an assigner from scratch but you can start by copying the code of 
AscendingTimestampExtractor.

Sorry for the inconvenience.

--
Aljoscha

> On 22. Feb 2018, at 12:05, Federico D'Ambrosio  wrote:
> 
> Hello everyone,
> 
> I'm consuming from a Kafka topic, on which I'm writing with a 
> FlinkKafkaProducer, with the timestamp relative flag set to true.
> 
> From what I gather from the documentation [1], Flink is aware of Kafka 
> Record's timestamp and only the watermark should be set with an appropriate 
> TimestampExtractor, still I'm failing to understand how to implement it in 
> the right way.
> 
> I thought that it would be possible to use the already existent 
> AscendingTimestampExtractor, overriding the extractTimestamp method, but it's 
> marked final. 
> new FlinkKafkaConsumer010[Event](ingestion_topic, new 
> JSONDeserializationSchema(), consumerConfig)
>   .setStartFromLatest()
> .assignTimestampsAndWatermarks(new AscendingTimestampExtractor[Event]() {
>   def extractAscendingTimestamp(element: Event): Long = ???
> })
> Should I need to implement my own TimestampExtractor (with the appropriate 
> getCurrentWatermark and extractTimestamp methods) ? 
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#using-kafka-timestamps-and-flink-event-time-in-kafka-010
>   
> 
> 
> Thank you,
> Federico
> 



Re: NoClassDefFoundError of a Avro class after cancel then resubmit the same job

2018-02-23 Thread Aljoscha Krettek
@Elias This is a know issue that will be fixed in 1.4.2 which we will do very 
quickly just because of this bug: 
https://issues.apache.org/jira/browse/FLINK-8741 
.

> On 23. Feb 2018, at 05:53, Elias Levy  wrote:
> 
> Something seems to be off with the user code class loader.  The only way I 
> can get my job to start is if I drop the job into the lib folder in the JM 
> and configure the JM's classloader.resolve-order to parent-first.
> 
> Suggestions?
> 
> On Thu, Feb 22, 2018 at 12:52 PM, Elias Levy  > wrote:
> I am currently suffering through similar issues.  
> 
> Had a job running happily, but when it the cluster tried to restarted it 
> would not find the JSON serializer in it. The job kept trying to restart in a 
> loop.
> 
> Just today I was running a job I built locally.  The job ran fine.  I added 
> two commits and rebuilt the jar.  Now the job dies when it tries to start 
> claiming it can't find the time assigner class.  I've unzipped the job jar, 
> both locally and in the TM blob directory and have confirmed the class is in 
> it.
> 
> This is the backtrace:
> 
> java.lang.ClassNotFoundException: com.foo.flink.common.util.TimeAssigner
>   at java.net.URLClassLoader.findClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
>   at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
>   at java.io.ObjectInputStream.readClassDesc(Unknown Source)
>   at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>   at java.io.ObjectInputStream.readObject0(Unknown Source)
>   at java.io.ObjectInputStream.readObject(Unknown Source)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
>   at 
> org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
>   at 
> org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:542)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.(AbstractFetcher.java:167)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.(Kafka09Fetcher.java:89)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.(Kafka010Fetcher.java:62)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>   at java.lang.Thread.run(Unknown Source)
> 
> On Tue, Jan 23, 2018 at 7:51 AM, Stephan Ewen  > wrote:
> Hi!
> 
> We changed a few things between 1.3 and 1.4 concerning Avro. One of the main 
> things is that Avro is no longer part of the core Flink class library, but 
> needs to be packaged into your application jar file.
> 
> The class loading / caching issues of 1.3 with respect to Avro should 
> disappear in Flink 1.4, because Avro classes and caches are scoped to the job 
> classloaders, so the caches do not go across different jobs, or even 
> different operators.
> 
> 
> Please check: Make sure you have Avro as a dependency in your jar file (in 
> scope "compile").
> 
> Hope that solves the issue.
> 
> Stephan
> 
> 
> On Mon, Jan 22, 2018 at 2:31 PM, Edward  > wrote:
> Yes, we've seen this issue as well, though it usually takes many more
> resubmits before the error pops up. Interestingly, of the 7 jobs we run (all
> of which use different Avro schemas), we only see this issue on 1 of them.
> Once the NoClassDefFoundError crops up though, it is necessary to recreate
> the task managers.
> 
> There's a whole page on the Flink documentation on debugging classloading,
> and Avro is 

Re: Task manager not able to rejoin job manager after network hicup

2018-02-23 Thread Aljoscha Krettek
@Till Is this the expected behaviour or do you suspect something could be going 
wrong?

> On 23. Feb 2018, at 08:59, jelmer  wrote:
> 
> We've observed on our flink 1.4.0 setup that if for some reason the 
> networking between the task manager and the job manager gets disrupted then 
> the task manager is never able to reconnect.
> 
> You'll end up with messages like this getting printed to the log repeatedly
> 
> Trying to register at JobManager 
> akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 17, timeout: 3 
> milliseconds)
> Quarantined address [akka.tcp://flink@jobmanager:6123] is still unreachable 
> or has not been restarted. Keeping it quarantined.
> 
> Or alternatively
> 
> 
> Tried to associate with unreachable remote address 
> [akka.tcp://flink@jobmanager:6123]. Address is now gated for 5000 ms, all 
> messages to this address will be delivered to dead letters. Reason: [The 
> remote system has quarantined this system. No further associations to the 
> remote system are possible until this system is restarted.
> 
> But it never recovers until you either restart the job manager or the task 
> manager
> 
> I was able to successfully reproduce this behaviour in two docker containers 
> here :
> 
> https://github.com/jelmerk/flink-worker-not-rejoining 
>  
> 
> Has anyone else seen this problem ?
> 
> 
> 
> 
> 
> 
> 



Re: Is Flink easy to deploy ?

2018-02-23 Thread Fabian Hueske
Have you had a look at the examples? [1]
They can be run out of the IDE.

Fabian

[1]
https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples

2018-02-23 13:30 GMT+01:00 Esa Heikkinen :

> I have lot of difficulties to deploy Flink. That is maybe because I am new
> with Flink and its (Java and Maven) development environment, but I would
> hear the opinions of others. I would like to use Scala.
>
>
>
> There are many examples, but often there are missing “imports” and
> settings in pom.xml. It seem to be very hard to job to find correct ones.
> Maybe use of IDE (IntelliJ IDEA) is almost mandatory and it helps to find
> “imports”, but it does not find all of them.
>
>
>
> Generally you have to do and study a lot of basic work before you get
> into the actual thing ?
>
>
>
> If there is a ready example (with source code) that is enough close to
> what you want, it is much easier to deploy. But if not, it can be a
> surprisingly difficult and time-consuming task. Because the documentation
> seem to be partially incomplete, it is often necessary to “google” and
> query the mailing list.
>
>
>
> Or have I misunderstand something or I can not use Flink correctly (yet) ?
>
>
>
> Features of Flink are so good that I would want to learn to use it.
>
>
>
> Best Regards
>
> Esa
>


Is Flink easy to deploy ?

2018-02-23 Thread Esa Heikkinen
I have lot of difficulties to deploy Flink. That is maybe because I am new with 
Flink and its (Java and Maven) development environment, but I would hear the 
opinions of others. I would like to use Scala.

There are many examples, but often there are missing "imports" and settings in 
pom.xml. It seem to be very hard to job to find correct ones. Maybe use of IDE 
(IntelliJ IDEA) is almost mandatory and it helps to find "imports", but it does 
not find all of them.

Generally you have to do and study a lot of basic work before you get into the 
actual thing ?

If there is a ready example (with source code) that is enough close to what you 
want, it is much easier to deploy. But if not, it can be a surprisingly 
difficult and time-consuming task. Because the documentation seem to be 
partially incomplete, it is often necessary to "google" and query the mailing 
list.

Or have I misunderstand something or I can not use Flink correctly (yet) ?

Features of Flink are so good that I would want to learn to use it.

Best Regards
Esa


Imports for example ?

2018-02-23 Thread Esa Heikkinen

I found interesting Scala example from:

https://flink.apache.org/news/2017/03/29/table-sql-api-update.html

But what imports I should use ?

And what in pom.xml and which versions ?

BR Esa


Re: Window with recent messages

2018-02-23 Thread Fabian Hueske
Hi Krzysztof,

Thanks for sharing your solution!
ProcessFunctions are the Swiss army knife of Flink :-)

Cheers, Fabian

2018-02-22 19:55 GMT+01:00 Krzysztof Białek :

> Hi Fabian,
>
> Thank you for your suggestion. In the meantime I rethought this problem
> and implemented alternative solution without using windows at all.
> I used plain ProcessFunction with
> 1. Keyed state (by companyId) - to keep ratings per key
> 2. EventTime timers - to remove outdated ratings from state and emit
> recalculated score immediately
>
> This solution gives results in real-time, windows would delay the results
> by 1 day I think.
>
> Regards,
> Krzysztof
>
>
> On Thu, Feb 22, 2018 at 9:44 AM, Fabian Hueske  wrote:
>
>> Hi Krzysztof,
>>
>> you could compute the stats in two stages:
>>
>> 1) compute an daily window. You should use a ReduceFunction or
>> AggreagteFunction here if possible to perform the computation eagerly.
>> 2) compute a sliding window of 90 days with a 1 day hop (or 90 rows with
>> a 1 row hop).
>>
>> That will crunch down the data in the first window and the second window
>> will combine the pre-aggregated results.
>>
>> Hope this helps,
>> Fabian
>>
>> 2018-02-19 16:36 GMT+01:00 Krzysztof Białek :
>>
>>> Hi,
>>>
>>> My app is calculating Companies scores from Ratings given by users.
>>> Only ratings from last 90 days should be considered.
>>>
>>> 1. Is it possible to construct window processing ratings from last 90
>>> days?
>>> I've started with *misusing* countWindow but this solution looks ugly
>>> for me.
>>>
>>> ratingStream
>>>   .filter(new OutdatedRatingsFilter(maxRatingAge))
>>>   .keyBy(_.companyId)
>>>   .countWindow(0L).trigger(new OnEventTrigger).evictor(new 
>>> OutdatedRatingsEvictor(maxRatingAge))
>>>   .process(ratingFunction)
>>>
>>>
>>> 2. How to recalculate score once the rating expires (after 90 days)?
>>> I don't want to put artificial ratings into the stream to trigger the
>>> recalculation.
>>>
>>> Any idea how can I do it better?
>>>
>>> Regards,
>>> Krzysztof
>>>
>>>
>>>
>>
>


Task manager not able to rejoin job manager after network hicup

2018-02-23 Thread jelmer
We've observed on our flink 1.4.0 setup that if for some reason the
networking between the task manager and the job manager gets disrupted then
the task manager is never able to reconnect.

You'll end up with messages like this getting printed to the log repeatedly

Trying to register at JobManager
akka.tcp://flink@jobmanager:6123/user/jobmanager (attempt 17, timeout:
3 milliseconds)
Quarantined address [akka.tcp://flink@jobmanager:6123] is still
unreachable or has not been restarted. Keeping it quarantined.


Or alternatively


Tried to associate with unreachable remote address
[akka.tcp://flink@jobmanager:6123]. Address is now gated for 5000 ms,
all messages to this address will be delivered to dead letters.
Reason: [The remote system has quarantined this system. No further
associations to the remote system are possible until this system is
restarted.


But it never recovers until you either restart the job manager or the task
manager

I was able to successfully reproduce this behaviour in two docker
containers here :

https://github.com/jelmerk/flink-worker-not-rejoining

Has anyone else seen this problem ?