Re: flink sql timed-window join throw "mismatched type" AssertionError on rowtime column

2018-03-09 Thread Timo Walther
ere any concern from performance or stability perspective? Best Yan *From:*Xingcan Cui mailto:xingc...@gmail.com>> *Sent:*Thursday, March 8, 2018 8:21:42 AM *To:*Timo Walther *Cc:*user; Yan Zhou [FDS Science] *Subject

Re: POJO default constructor - how is it used by Flink?

2018-03-12 Thread Timo Walther
Hi Alex, I guess your "real" constuctor is invoked by your code within your Flink program. The default constructor is used during serialization between operators. If you are interested in the internals, you can have a look at the PojoSerializer [1]. The POJO is created with the default constr

Re: Share state across operators

2018-03-12 Thread Timo Walther
Hi Max, I would go with the Either approach if you want to ensure that the initital state and the first element arrive in the right order. Performance-wise there should not be a big different between both approaches. The side outputs are more meant for have a side channel beside the main stre

Re: Production-readyness of Flink SQL

2018-03-12 Thread Timo Walther
Hi Philip, you are absolutely right that Flink SQL is definitely production ready. It has been developed for 2 years now and is used at Uber, Alibaba, Huawei and many other companies. We usually only merge production ready code or add an explicit warning about it. I will finally remove this

Re: activemq connector not working..

2018-03-14 Thread Timo Walther
Hi Puneet, are you running this job on the cluster or locally in your IDE? Regards, Timo Am 14.03.18 um 13:49 schrieb Puneet Kinra: Hi I used apache bahir connector  below is the code.the job is getting finished and not generated the output as well ,ideal it should keep on running below th

Re: Flink SSL Setup on a standalone cluster

2018-03-14 Thread Timo Walther
Hi Vinay, do you have any exception or log entry that describes the failure? Regards, Timo Am 14.03.18 um 15:51 schrieb Vinay Patil: Hi, I have keystore for each of the 4 nodes in cluster and respective trustore. The cluster is configured correctly with SSL , verified this by accessing job

Re: Question regarding effect of 'restart-strategy: none' on Flink (1.4.1) JobManager HA

2018-03-14 Thread Timo Walther
Hi Shaswata, are you using a standalone Flink cluster or how does your deployement look like? E.g. YARN has its own restart attempts [1]. Regards, Timo [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/jobmanager_high_availability.html#yarn-cluster-high-availability Am 1

Re: How to handle large lookup tables that update rarely in Apache Flink - Stack Overflow

2018-03-26 Thread Timo Walther
Hi Pete, you can find some basic examples about stream enrichment here [1]. I hope this helps a bit. Regards, Timo [1] http://training.data-artisans.com/exercises/rideEnrichment-flatmap.html [2] http://training.data-artisans.com/exercises/rideEnrichment-processfunction.html Am 25.03.18 um

Re: "dynamic" bucketing sink

2018-03-26 Thread Timo Walther
Hi Christophe, I think this will require more effort. As far as I know there is no such "dynamic" feature. Have you looked in to the bucketing sink code? Maybe you can adapt it to your needs? Otherwise it might also make sense to open an issue for it to discuss a design for it. Maybe other c

Re: Issue in Flink/Zookeeper authentication via Kerberos

2018-03-26 Thread Timo Walther
Hi Sarthak, I'm not a Kerberos expert but maybe Eron or Shuyi are more familiar with the details? Would be great if somebody could help. Thanks, Timo Am 22.03.18 um 10:16 schrieb Sahu, Sarthak 1. (Nokia - IN/Bangalore): Hi Folks, *_Environment Setup:_* 1. I have configured KDC 5 server.

Re: Query regarding to CountinousFileMonitoring operator

2018-03-26 Thread Timo Walther
Hi Puneet, can you share a little code example with us? I could not reproduce your problem. You have to keep in mind that a setParallelism() only affects the last operation. If you want to change the default parallelism of the entire pipeline, you have to change it in StreamExecutionEnvironm

Re: How can I confirm a savepoint is used for a new job?

2018-03-26 Thread Timo Walther
Hi Hao, I quickly checked that manually. There should be a message similar to the one below in the JobManager log: INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator  - Starting job from savepoint ... Regards, Timo Am 22.03.18 um 06:45 schrieb Hao Sun: Do we have any logs in J

Re: InterruptedException when async function is cancelled

2018-03-26 Thread Timo Walther
Hi Ken, as you can see here [1], Flink interrupts the timer service after a certain timeout. If you want to get rid of the exception, you should increase "task.cancellation.timers.timeout" in the configuration. Actually, the default is already set to 7 seconds. So your exception should not b

Re: Out off memory when catching up

2018-03-26 Thread Timo Walther
Hi Lasse, in order to avoid OOM exception you should analyze your Flink job implementation. Are you creating a lot of objects within your Flink functions? Which state backend are you using? Maybe you can tell us a little bit more about your pipeline? Usually, there should be enough memory fo

Re: Table/SQL Kafka Sink Question

2018-03-27 Thread Timo Walther
Hi Alexandru, the KafkaTableSink does not expose all features of the underlying DataStream API. Either you convert your table program to the DataStream API for the sink operation or you just extend a class like Kafka010JsonTableSink and customize it. Regards, Timo Am 27.03.18 um 11:59 schr

Re: How can I set configuration of process function from job's main?

2018-03-29 Thread Timo Walther
Hi, the configuration parameter is just legacy API. You can simply pass any serializable object to the constructor of your process function. Regards, Timo Am 29.03.18 um 20:38 schrieb Main Frame: Hi guys! Iam newbie in flink and I have probably silly question about streaming api. So for t

Re: Task Manager fault tolerance does not work

2018-04-03 Thread Timo Walther
Hi, does your job code declare a higher parallelism than 2? Or is submitted with a higher parallelism? What is the Web UI displaying? Regards, Timo Am 03.04.18 um 10:48 schrieb dhirajpraj: Hi, I have done that env.enableCheckpointing(5000L); env.setRestartStrategy(RestartStrategies.fixedDela

Re: subuquery about flink sql

2018-04-03 Thread Timo Walther
Hi, there are multiple issues in your query. First of all, "SELECT DISTINCT(user), product" is MySQL specific syntax and is interpreted as "SELECT DISTINCT user, product" which is not what you want I guess. Secondly, SQL windows can only be applied on time attributes. Meaning: "As long as a

Re: Task Manager fault tolerance does not work

2018-04-03 Thread Timo Walther
Could you provide a little reproducible example? Which file system are you using? This sounds like a bug to me that should be fixed if valid. Am 03.04.18 um 11:28 schrieb dhirajpraj: I have not specified any parallelism in the job code. So I guess, the parallelism should be set to parallelism.d

Re: Temporary failure in name resolution

2018-04-03 Thread Timo Walther
Hi Miki, for me this sounds like your job has a resource leak such that your memory fills up and the JVM of the TaskManager is killed at some point. How does your job look like? I see a WindowedStream.apply which might not be appropriate if you have big/frequent windows where the evaluation h

Re: Multiple (non-consecutive) keyBy operators in a dataflow

2018-04-03 Thread Timo Walther
Hi Andre, every keyBy is a shuffle over the network and thus introduces some overhead. Esp. serialization of records between operators if object reuse is disabled by default. If you think that not all slots (and thus all nodes) are not fully occupied evenly in the first keyBy operation (e.g.

Re: Watermark Question on Failed Process

2018-04-03 Thread Timo Walther
Hi Chengzhi, if you emit a watermark even though there is still data with a lower timestamp, you generate "late data" that either needs to be processed in a separate branch of your pipeline (see sideOutputLateData() [1]) or should force your existing operators to update their previously emitte

Re: Savepointing with Avro Schema change

2018-04-03 Thread Timo Walther
Hi Aneesha, as far as I know Avro objects were serialized with Flink's POJO serializer in the past. This behavior changed in 1.4. @Gordon: do you have more information how good we support Avro schema evolution today? Regards, Timo Am 03.04.18 um 12:11 schrieb Aneesha Kaushal: Hello, I hav

Re: Side outputs never getting consumed

2018-04-03 Thread Timo Walther
Hi Julio, I tried to reproduce your problem locally but everything run correctly. Could you share a little example job with us? This worked for me: class TestingClass { var hello:Int =0 } class TestAextends TestingClass { var test:String = _ } def main(args: Array[String]) { // set u

Re: Kafka exceptions in Flink log file

2018-04-03 Thread Timo Walther
Hi Alex, which version of Flink are you running? There were some class loading issues with Kafka recently. I would try it with the newest Flink version. Otherwise ClassNotFoundException usually indicates that something is wrong with your dependencies. Maybe you can share your pom.xml with us.

Re: Task Manager fault tolerance does not work

2018-04-03 Thread Timo Walther
@Till: Do you have any advice for this issue? Am 03.04.18 um 11:54 schrieb dhirajpraj: What I have found is that the TM fault tolerance behaviour is not consistent. Sometimes it works and sometimes it doesnt. I am attaching my java code file (which is the main class). What I did was: 1) Run cl

Re: Reading data from Cassandra

2018-04-03 Thread Timo Walther
Hi Soheil, yes Flink supports reading from Cassandra. You can find some examples here: https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-cassandra/src/test/java/org/apache/flink/streaming/connectors/cassandra/example Regards, Timo Am 31.03.18 um 20:22 schrieb Soheil

Re: Reading data from Cassandra

2018-04-03 Thread Timo Walther
Client. Regards, Timo [1] https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/asyncio.html Am 03.04.18 um 16:37 schrieb Timo Walther: Hi Soheil, yes Flink supports reading from Cassandra. You can find some examples here: https://github.com/apache/flink/tree/master

Re: Multiple (non-consecutive) keyBy operators in a dataflow

2018-04-03 Thread Timo Walther
d the second keyBy and thus prevent the network shuffle. Thanks, Arun Timo Walther wrote Hi Andre, every keyBy is a shuffle over the network and thus introduces some overhead. Esp. serialization of records between operators if object reuse is disabled by default. If you think that not all slots (and

Re: Side outputs never getting consumed

2018-04-06 Thread Timo Walther
Apr 3, 2018 at 10:17 AM, Timo Walther <mailto:twal...@apache.org>> wrote: Hi Julio, I tried to reproduce your problem locally but everything run correctly. Could you share a little example job with us? This worked for me: class TestingClass { var hello:Int

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Timo Walther
Hi Petter, could you share the source code of the class that Avro generates out of this schema? Thank you. Regards, Timo Am 18.04.18 um 11:00 schrieb Petter Arvidsson: Hello everyone, I am trying to figure out how to set up Flink with Avro for state management (especially the content of s

Re: Managing state migrations with Flink and Avro

2018-04-18 Thread Timo Walther
, Petter On Wed, Apr 18, 2018 at 11:32 AM, Timo Walther <mailto:twal...@apache.org>> wrote: Hi Petter, could you share the source code of the class that Avro generates out of this schema? Thank you. Regards, Timo Am 18.04.18 um 11:00 schrieb Petter Arvidsson:

Re: Applying an void function to DataStream

2018-04-19 Thread Timo Walther
Hi Soheil, Flink supports the type "java.lang.Void" which you can use in this case. Regards, Timo Am 19.04.18 um 15:16 schrieb Soheil Pourbafrani: Hi, I have a void function that takes a String, parse it and write it into Cassandra (Using pure java, not Flink Cassandra connector). Using Apac

Re: Applying an void function to DataStream

2018-04-19 Thread Timo Walther
>> wrote: Thanks, my map code is like this: stream.map(x -> parse(x)); I can't get what you mean! Something like the line below? DataStream t = stream.map(x -> parse(x)); ? On Thu, Apr 19, 2018 at 5:49 PM, Timo Walther mailto:twal...@apache

Re: Managing state migrations with Flink and Avro

2018-04-20 Thread Timo Walther
? Regards, Timo Am 18.04.18 um 14:21 schrieb Timo Walther: Thank you. Maybe we already identified the issue (see https://issues.apache.org/jira/browse/FLINK-9202). I will use your code to verify it. Regards, Timo Am 18.04.18 um 14:07 schrieb Petter Arvidsson: Hi Timo, Please find the generated

Re: Why assignTimestampsAndWatermarks parallelism same as map,it will not fired?

2018-04-25 Thread Timo Walther
Hi, did you set your time characteristics to even-time? env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); Regards, Timo Am 25.04.18 um 05:15 schrieb 潘 功森: Hi all, I use the same parallelism between map and assignTimestampsAndWatermarks , and it not fired, I saw the extractTime

Re: Kafka to Flink Avro Deserializer

2018-04-25 Thread Timo Walther
Hi Sebastien, for me this seems more an Avro issue than a Flink issue. You can ignore the shaded exception, we shade Google utilities for avoiding depencency conflicts. The root cause is this: java.lang.NullPointerException     at org.apache.avro.specific.SpecificData.getSchema (SpecificDat

Re: Wrong endpoints to cancel a job

2018-04-25 Thread Timo Walther
Hi Dongwon, please send such mails to the dev@ instead of the user@ as Flink 1.5.0 is not released yet. As far as I know the documentation around deployment and FLIP-6 has not been updated yet. But thank you for letting us know! Regards, Timo Am 25.04.18 um 11:03 schrieb Dongwon Kim: Hi,

Re: Leader Retrieval Timeout with HA Job Manager

2018-05-15 Thread Timo Walther
Hi Jason, this sounds more like a network connection/firewall issue to me. Can you tell us a bit more about your environment? Are you running your Flink cluster on a cloud provider? Regards, Timo Am 15.05.18 um 05:15 schrieb Jason Kania: Hi, I am using the 1.4.2 release on ubuntu and atte

Re: minPauseBetweenCheckpoints for failed checkpoints

2018-05-15 Thread Timo Walther
Hi Dmitry, I think the minPauseBetweenCheckpoints is intended for pausing between successful checkpoints. Usually a user wants to get a successful checkpoint as quickly as possible again. Stefan (in CC) might know more about. Regards, Timo Am 15.05.18 um 03:28 schrieb Dmitry Minaev: Hello!

Re: Flink does not read from some Kafka Partitions

2018-05-15 Thread Timo Walther
Hi Ruby, which Flink version are you using? When looking into the code of the org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase you can see that the behavior for using partition discovery or not depends on the Flink version. Regards, Timo Am 15.05.18 um 02:01 schrieb Ruby

Re: AvroInputFormat Serialisation Issue

2018-05-15 Thread Timo Walther
Hi Padarn, usually people are using the AvroInputFormat with the Avro class generated by an Avro schema. But after looking into the implementation, one should also be able to use the GenericRecord class as a parameter. So your exception seems to be a bug if it works locally but not distribute

Re: Akka heartbeat configurations

2018-05-15 Thread Timo Walther
Hi, increasing the time to detect a dead task manager usually increases the amount of elements that need to be reprocessed in case of a failure. Once a dead task manager is identified, the entire application is rolled back to the latest successful checkpointed/consistent state of the applicat

Re: Leader Retrieval Timeout with HA Job Manager

2018-05-15 Thread Timo Walther
The details in the current logs are insufficient to know what is happening. Thanks, Jason On Tuesday, May 15, 2018, 7:51:40 a.m. EDT, Timo Walther wrote: Hi Jason, this sounds more like a network connection/firewall issue to me. Can you tell us a bit more about your environment? Are you run

Re: Async Source Function in Flink

2018-05-15 Thread Timo Walther
Hi Frederico, Flink's AsyncFunction is meant for enriching a record with information that needs to be queried externally. So I guess you can't use it for your use case because an async call is initiated by the input. However, your custom SourceFunction could implement a similar asynchronous lo

Re: AvroInputFormat Serialisation Issue

2018-05-15 Thread Timo Walther
tion > So your exception seems to be a bug if it works locally but not distributed. Hmm, well its nice to know I'm not just doing something stupid :-) Perhaps I'll try compile flink myself so I can try and debug this. On Tue, May 15, 2018 at 8:54 PM Timo Walther <mailto:twal

Re: increasing parallelism increases the end2end latency in flink sql

2018-05-24 Thread Timo Walther
Hi Yan, SQL should not be the cause here. It is true that Flink removes the timestamp from a record when entering the SQL API but this timestamp is set again before time-based operations such as OVER windows. Watermarks are not touched. I think your issue is related to [2]. One explanation th

Re: Timers and Checkpoints

2018-05-25 Thread Timo Walther
Hi Alberto, do you get exactly the same exception? Maybe you can share some logs with us? Regards, Timo Am 25.05.18 um 13:41 schrieb Alberto Mancini: Hello, I think we are experiencing this issue: https://issues.apache.org/jira/browse/FLINK-6291 In fact we have a long running job that is un

Re: latency critical job

2018-05-25 Thread Timo Walther
Hi, usually Flink should have constant latency if the job is implemented correctly. But if you want to implement something like an external monitoring process., you can use the REST API [1] and metrics [2] to model such an behavior by restarting your application. In theory, you could also imp

Re: Java Code for Kafka Flink SQL

2018-06-04 Thread Timo Walther
Hi Rad, at a first glance your example does not look too bad. Which exceptions do you get? Did you create your pom.xml with the provided template [1] and then added flink-table, flink-connector-kafkaXXX, flink-streaming-scala? Regards, Timo [1] https://ci.apache.org/projects/flink/flink-doc

Re: Ask for SQL using kafka in Flink

2018-06-04 Thread Timo Walther
Hi, as you can see in code [1] Kafka09JsonTableSource takes a TableSchema. You can create table schema from type information see [2]. Regards, Timo [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/k

Re: Ask for SQL using kafka in Flink

2018-06-05 Thread Timo Walther
on, Jun 4, 2018 at 12:57 AM, Timo Walther <mailto:twal...@apache.org>> wrote: Hi, as you can see in code [1] Kafka09JsonTableSource takes a TableSchema. You can create table schema from type information see [2]. Regards, Timo [1] https://github.com

Re: Cannot determine simple type name - [FLINK-7490]

2018-06-05 Thread Timo Walther
This sounds similar to https://issues.apache.org/jira/browse/FLINK-9220. Can you explain the steps that I have to do to reproduce the error? Regards, Timo Am 05.06.18 um 08:06 schrieb Chesnay Schepler: Please re-open the issue. It would be great if you could also provide us with a reproducing

Re: Question about Memory Manage in the Streaming mode

2016-12-15 Thread Timo Walther
Hi Tao, no, streaming jobs do not use managed memory yet. Managed memory is useful for sorting, joining and grouping bounded data. Unbounded stream do not need that. It could be used in the future e.g. to store state or for new operators, but is this is not on the roadmap so far. Regards,

Re: Kafka 09 consumer does not commit offsets

2017-01-09 Thread Timo Walther
I'm not a Kafka expert but maybe Gordon (in CC) knows more. Timo Am 09/01/17 um 11:51 schrieb Renjie Liu: Hi, all: I'm using flink 1.1.3 and kafka consumer 09. I read its code and it says that the kafka consumer will turn on auto offset commit if checkpoint is not enabled. I've turned off ch

Re: Fwd: Continuous File monitoring not reading nested files

2017-01-09 Thread Timo Walther
Hi Lukas, have you tried to set the parameter " recursive.file.enumeration" to true? |// create a configuration object Configuration parameters = new Configuration(); // set the recursive enumeration parameter parameters.setBoolean("recursive.file.enumeration", true); | If this also does not

Re: the attribute order in sql 'select * from...'

2017-01-13 Thread Timo Walther
Hi Yuhong, as a solution you can specify the order of your Pojo fields when converting from DataStream to Table. Table table = tableEnv .fromDataSet(env.fromCollection(data), "department AS a, " + "age AS b, " + "salary AS c, " + "name AS d") .select("a, b, c, d"); T

Re: the attribute order in sql 'select * from...'

2017-01-13 Thread Timo Walther
Especially, as it might also change the serialized binary format. Am 13/01/17 um 11:24 schrieb Fabian Hueske: I think the sorting is done for consistency reasons, i.e., that all PojoTypeInfos for the same class behave the same. Since this code is used in many parts of Flink and many jobs (DataS

Re: Three input stream operator and back pressure

2017-01-17 Thread Timo Walther
Hi Dmitry, the runtime supports an arbitrary number of inputs, however, the API does currently not provide a convenient way. You could use the "union" operator to reduce the number of inputs. Otherwise I think you have to implement your own operator. That depends on your use case though. You

Re: Release 1.2?

2017-01-17 Thread Timo Walther
Hi Denis, the first 1.2 RC0 has already been released and the RC1 is on the way (maybe already this week). I think that we can expect a 1.2 release in 3-4 weeks. Regards, Timo Am 17/01/17 um 10:04 schrieb denis.doll...@thomsonreuters.com: Hi all, Do you have some ballpark estimate for a

Re: Possible JVM native memory leak

2017-01-17 Thread Timo Walther
This sounds like a RocksDB issue. Maybe Stefan (in CC) has an idea? Timo Am 17/01/17 um 14:52 schrieb Avihai Berkovitz: Hello, I am running a streaming job on a small cluster, and after a few hours I noticed that my TaskManager processes are being killed by the OOM killer. The processes we

Re: Flink Run command :Replace placeholder value in spring xml

2017-01-17 Thread Timo Walther
Hi Sunil, what is the content of args[0] when you execute public static void main(String[] args) { System.out.println(args[0]); } Am 17/01/17 um 14:55 schrieb raikarsunil: Hi, I am not able to replace value into spring place holder .Below is the xml code snippet . file:#{systemProperties['con

Re: Flink Run command :Replace placeholder value in spring xml

2017-01-17 Thread Timo Walther
I'm not sure what you want to do with this configuration. But you should keep in mind that all properties you set are only valid in the Flink Client that submits the job to the JobManager, you cannot access this property within a Flink Function such as MapFunction. Maybe you could show us a bit

Re: Zeppelin: Flink Kafka Connector

2017-01-17 Thread Timo Walther
You are using an old version of Flink (0.10.2). A FlinkKafkaConsumer010 was not present at that time. You need to upgrade to Flink 1.2. Timo Am 17/01/17 um 15:58 schrieb Neil Derraugh: This is really a Zeppelin question, and I’ve already posted to the user list there. I’m just trying to dra

Re: .keyBy() on ConnectedStream

2017-01-27 Thread Timo Walther
Hi Matt, the keyBy() on ConnectedStream has two parameters to specify the key of the left and of the right stream. Same keys end up in the same CoMapFunction/CoFlatMapFunction. If you want to group both streams on a common key, then you can use .union() instead of .connect(). I hope that hel

Re: Datastream - writeAsCsv creates empty File

2017-01-27 Thread Timo Walther
Hi Nico, writeAsCsv has limited functionality in this case. I recommend to use the Bucketing File Sink[1] where you can specify a interval and batch size when to flush. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/filesystem_sink.html#bucketing-file-sink T

Re: Compiler error while using 'CsvTableSource'

2017-02-06 Thread Timo Walther
I created an issue to make this a bit more user-friendly in the future. https://issues.apache.org/jira/browse/FLINK-5714 Timo Am 05/02/17 um 06:08 schrieb nsengupta: Thanks, Till, for taking time to share your understanding. -- N On Sun, Feb 5, 2017 at 12:49 AM, Till Rohrmann [via Apache Fl

Re: Table API: java.sql.DateTime is not supported;

2017-02-06 Thread Timo Walther
Hi, java.sql.Timestamps have to have a format like " -mm-dd hh:mm:ss.[fff...]". In your case you need to parse this as a String and write your own scalar function for parsing. Regards, Timo Am 04/02/17 um 17:46 schrieb nsengupta: "4/1/2014 0:11:00",40.769,-73.9549,"B02512"

Re: Unable to use Scala's BeanProperty with classes

2017-02-15 Thread Timo Walther
Hi Adarsh, I looked into your issue. The problem is that `var` generates Scala-style getters/setters and the annotation generates Java-style getters/setters. Right now Flink only supports one style in a POJO, I don't know why we have this restriction. I will work on a fix for that. Is it poss

Re: Unable to use Scala's BeanProperty with classes

2017-02-15 Thread Timo Walther
Forget what I said about omitting `var`, this would remove the field from the POJO. I opened a PR for fixing the issue: https://github.com/apache/flink/pull/3318 As a workaround: If you just want to have a POJO for the Cassandra Sink you don't need to add the `@BeanProperty` annotation. Flink

Re: Is it OK to have very many session windows?

2017-02-20 Thread Timo Walther
Hi Vadim, this of course depends on your use case. The question is how large is your state per pane and how much memory is available for Flink? Are you using incremental aggregates such that only the aggregated value per pane has to be kept in memory? Regards, Timo Am 20/02/17 um 16:34 schr

Re: flink/cancel & shutdown hooks

2017-03-08 Thread Timo Walther
Hi Dominik, did you take a look into the logs? Maybe the exception is not shown in the CLI but in the logs. Timo Am 07/03/17 um 23:58 schrieb Dominik Safaric: Hi all, I would appreciate for any help or advice in regard to default Java runtime shutdown hooks and canceling Flink jobs. Namel

Re: Appropriate State to use to buffer events in ProcessFunction

2017-03-08 Thread Timo Walther
Hi Yassine, have you thought about using a ListState? As far as I know, it keeps at least the insertion order. You could sort it once your trigger event has arrived. If you use a RocksDB as state backend, 100+ GB of state should not be a problem. Have you thought about using Flink's CEP librar

Re: window function not working when control stream broadcast

2017-03-08 Thread Timo Walther
Hi Sam, could you explain the behavior a bit more? How does the window function behave? Is it not triggered or what is the content? What is the result if you don't use a window function? Timo Am 08/03/17 um 02:59 schrieb Sam Huang: btw, the reduce function works well, I've printed out the

Re: Flink 1.2 Proper Packaging of flink-table with SBT

2017-03-08 Thread Timo Walther
Hi Justin, thank you for reporting your issues. I never tried the Table API with SBT but `flink-table` should not declare dependencies to core modules, this is only done in `test` scope, maybe you have to specify the right scope manually? You are right, the mentioned Jira should be fixed asap,

Re: Job fails to start with S3 savepoint

2017-03-20 Thread Timo Walther
Hi Abhinav, can you check if you have configured your AWS setup correctly? The S3 configuration might be missing. https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/aws.html#missing-s3-filesystem-configuration Regards, Timo Am 18/03/17 um 01:33 schrieb Bajaj, Abhinav: Hi,

Re: load balancing of keys to operators

2017-03-20 Thread Timo Walther
Hi, using keyBy Flink ensures that every set of records with same key is send to the same operator, otherwise it would not be possible to process them as a whole. It depends on your use case if it is also ok that another operator processes parts of this set of records. You can implement you o

Re: shaded version of legacy kafka connectors

2017-03-21 Thread Timo Walther
Hi Gwenhael, I will loop in Gordon becaue he is more familar with the Kafka connectors. Have you experiences with two versions in the same project? Am 20/03/17 um 15:57 schrieb Gwenhael Pasquiers: Hi, Before doing it myself I thought it would be better to ask. We need to consume from kafk

Re: load balancing of keys to operators

2017-03-21 Thread Timo Walther
I think it very depends on your use case, maybe you can use combiner first to reduce the amount of records per key. Maybe you can explain your application a bit more (which window, type of aggregations). It often helps e.g. to introduce an artifical key und merge the result of multiple windows

Re: 20 times higher throughput with Window function vs fold function, intended?

2017-03-29 Thread Timo Walther
Hi Kamil, the performance implications might be the result of which state the underlying functions are using internally. WindowFunctions use ListState or ReducingState, fold() uses FoldingState. It also depends on the size of your state and the state backend you are using. I recommend the fol

Re: Gelly - which partitioning

2017-03-29 Thread Timo Walther
Hi Marc, maybe Greg (in CC) can help answering your question? Regards, Timo Am 29/03/17 um 11:50 schrieb Kaepke, Marc: Hi guys, I can’t found on web which graph partitioning are supported by Gelly. During my search I found this link. But the ticket is still open. https://cwiki.apache.org/co

Re: Why flink 1.2.0 delete flink-connector-redis?

2017-04-26 Thread Timo Walther
Hi, the Flink community decided to have the most important connectors (e.g. Kafka) in the core repository. All other connectors are in Apache Bahir (http://bahir.apache.org/). You can find the flink-connector-redis there. Timo Am 26/04/17 um 12:54 schrieb yunfan123: It exists in 1.1.5. But

Re: Flink docs in regards to State

2017-04-26 Thread Timo Walther
Hi, you are right. There are some limitation about RichReduceFunctions on windows. Maybe the new AggregateFunction `window.aggregate()` could solve your problem, you can provide an accumulator which is your custom state that you can update for each record. I couldn't find a documentation page

Re: Kafka 0.10 jaas multiple clients

2017-04-26 Thread Timo Walther
Hi Gwenhael, I'm not a Kafka expert but if something is hardcoded that should not, it might be worth opening an issue for it. I loop in somebody who might knows more your problem. Timo Am 26/04/17 um 14:47 schrieb Gwenhael Pasquiers: Hello, Up to now we’ve been using kafka with jaas (pla

Re: Deactive a job like storm

2017-05-10 Thread Timo Walther
This is called "stop" in Flink. You can find a short description here: https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cli.html " The difference between cancelling and stopping a (streaming) job is the following: On a cancel call, the operators in a job immediately receive a

Re: State in Custom Tumble Window Class

2017-05-17 Thread Timo Walther
Hi, in general, a class level variable is not managed by Flink if it is not defined as state or the function does not implemented ListCheckpointed interface. Allowing infinite lateness also means that your window content has to be stored infinitely. I'm not sure if I understand your implement

Re: Question about jobmanager.web.upload.dir

2017-05-17 Thread Timo Walther
Hey Mauro, I'm not aware of any reason for that. I loop in Chesnay, maybe he knows why. @Chesnay wouldn't it be helpful to also archive the jars using the HistoryServer? Timo Am 17.05.17 um 12:31 schrieb Mauro Cortellazzi: Hi Flink comunity, is there a particular reason to delete the jobm

Re: Tumbling window expects a time attribute for grouping in a stream environment

2017-05-24 Thread Timo Walther
Hi Enrico, the docs of the 1.3-SNAPSHOT are a bit out of sync right now, but they will be updated in the next days/1-2 weeks. We recently introduced so-called "time indicators". These are attributes that correspond to Flink's time and watermarks. You declare a logical field that represents F

Re: How can I increase Flink managed memory?

2017-05-31 Thread Timo Walther
Hi Sathi, configuring managed memory and taskmanager heap memory also depends on your jobs. If your jobs involve much sorting, then a high managed memory is helpful. If you are creating a lot of objects in your functions, that need some time until they get garbage collected (ideally you should

Re: Does job restart resume from last known internal checkpoint?

2017-05-31 Thread Timo Walther
Hi Moiz, yes the job will be restartet in case of failure using the last successful checkpoint. If you cancel the job, the checkpoints will be discarded. That's why Flink has savepoints [1] in order to store checkpoints permantently (with additional meta-information). If there is no checkpoin

Re: Stream sql example

2017-06-09 Thread Timo Walther
Hi David, I think the problem is that the type of the DataStream produced by the TableSource, does not match the type that is declared in the ` getReturnType()`. A `MapFunction` is always a generic type (because Row cannot be analyzed). A solution would be that the mapper implements `ResultTy

Re: About nodes number on Flink

2017-06-23 Thread Timo Walther
Hi Andrea, the number of nodes usually depends on the work that you do within your Functions. E.g. if you have a computation intensive machine learning library in a MapFunction and takes 10 seconds per element, it might make sense to paralellize this in order to increase your throughput. Or

Re: About nodes number on Flink

2017-06-26 Thread Timo Walther
If you really what to run one operation per node. You start 1 TaskManager with 1 slot on every node. For each operation you set a new chain and a new slot sharing group. Timo Am 23.06.17 um 15:03 schrieb AndreaKinn: Hi Timo, thanks for your answer. I think my elaboration are not too much heav

Re: Flink Elasticsearch Connector: Lucene Error message

2017-07-13 Thread Timo Walther
Hi Fabian, I loop in Gordon. Maybe he knows whats happening here. Regards, Timo Am 13.07.17 um 13:26 schrieb Fabian Wollert: Hi everyone, I'm trying to make use of the new Elasticsearch Connector

Re: Read configuration instead of hard code

2017-07-13 Thread Timo Walther
Hi Desheng, Flink programs are defined in a regular Java main() method. They are executed on the Flink Client (usually the JobManeger) when submitted, you can add arbirary additional logic (like reading a file from an NFS) to the code. After retrieving the Kafka Info you can pass it to the Fl

Re: Reading static data

2017-07-13 Thread Timo Walther
Hi Mohit, do you plan to implement a batch or streaming job? If it is a streaming job: You can use a connected stream (see [1], Slide 34). The static data is one side of the stream that could be updated from time to time and will always propagated (using a broadcast()) to all workers that do

Re: AVRO Union type support in Flink

2017-07-19 Thread Timo Walther
Hi Vishnu, I took a look into the code. Actually, we should support it. However, those types might be mapped to Java Objects that will be serialized with our generic Kryo serializer. Have you tested it? Regards, Timo Am 19.07.17 um 06:30 schrieb Martin Eden: Hey Vishnu, For those of us on

Re: How can I set charset for flink sql?

2017-07-25 Thread Timo Walther
Hi, currently Flink does not support this charset in a LIKE expression. This is due to a limitation in the Apache Calcite library. Maybe you can open an issue there. The easiest solution for this is to implement your own scalar function, that does a `string.contains("")`. Here you can

Re: Storing POJO's to RocksDB state backend

2017-08-02 Thread Timo Walther
Hi Biplob, Flink is shipped with own serializers. POJOs and other datatypes are analyzed automatically. Kryo is only the fallback option, if your class does not meet the POJO criteria (see [1]). Usually, all serialization/deserialization to e.g. RocksDB happens internally and the user doesn't

Re: Eventime window

2017-08-02 Thread Timo Walther
Hi Govind, if the window is not triggered, this usually indicates that your timestamp and watermark assignment is not correct. According to your description, I don't think that you need a custom trigger/evictor. How often do events arrive from one device? There must be another event from the

<    1   2   3   4   5   6   7   >