Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-28 Thread Avi Levi
Checkout this little App. you can see that the file is created but no data is written. even for a single record import io.eels.component.parquet.ParquetWriterConfig import org.apache.avro.Schema import org.apache.avro.generic.{ GenericData, GenericRecord } import

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-28 Thread vipul singh
Can you try closing the writer? AvroParquetWriter has an internal buffer. Try doing a .close() in snapshot()( since you are checkpointing hence this method will be called) On Wed, Nov 28, 2018 at 7:33 PM Avi Levi wrote: > Thanks Rafi, > I am actually not using assignTimestampsAndWatermarks , I

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-28 Thread Avi Levi
Thanks Rafi, I am actually not using assignTimestampsAndWatermarks , I will try to add it as you suggested. however it seems that the messages I repeating in the stream over and over even if I am pushing single message manually to the queue, that message will repeat infinity Cheers Avi On Wed,

SQL Query named operator exceeds 80 characters

2018-11-28 Thread shkob1
It seems like the operator name for a SQL group by is the query string itself. I get "The operator name groupBy: (myGroupField), select: (myGroupField, someOther... )... exceeded the 80 characters length limit and was truncated" Is there a way to name the SQL query operator? -- Sent from:

Updating multiple database tables

2018-11-28 Thread Dylan Adams
Hello, I was hoping to get input from the Flink community about patterns for handling multiple dependent RDMS updates. A textbook example would be order & order_line_item tables. I've encountered a few approaches to this problem, and I'm curious to see if there are others, and the benefits &

Re: Changes in Flink 1.6.2

2018-11-28 Thread Boris Lublinsky
Here is the code def executeLocal() : Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment buildGraph(env) System.out.println("[info] Job ID: " + env.getStreamGraph.getJobGraph.getJobID) env.execute() } And an error Error:(68, 63) ambiguous reference to overloaded

Re: Changes in Flink 1.6.2

2018-11-28 Thread Dominik Wosiński
Hey, Could you show the message that You are getting? Best Regards, Dom. śr., 28 lis 2018 o 19:08 Boris Lublinsky napisał(a): > > > > Prior to Flink version 1.6.2 including 1.6.1 > env.getStreamGraph.getJobGraph was happily returning currently defined > Graph, but in 1.6.2 this fails to compile

Re: Stream in loop and not getting to sink (Parquet writer )

2018-11-28 Thread Rafi Aroch
Hi Avi, I can't see the part where you use assignTimestampsAndWatermarks. If this part in not set properly, it's possible that watermarks are not sent and nothing will be written to your Sink. See here for more details:

Stream in loop and not getting to sink (Parquet writer )

2018-11-28 Thread Avi Levi
Hi, I am trying to implement Parquet Writer as SinkFunction. The pipeline consists of kafka as source and parquet file as a sink however it seems like the stream is repeating itself like endless loop and the parquet file is not written . can someone please help me with this? object

Changes in Flink 1.6.2

2018-11-28 Thread Boris Lublinsky
Prior to Flink version 1.6.2 including 1.6.1 env.getStreamGraph.getJobGraph was happily returning currently defined Graph, but in 1.6.2 this fails to compile with a pretty cryptic message AM I missing something? Boris Lublinsky FDP Architect boris.lublin...@lightbend.com

Dataset column statistics

2018-11-28 Thread Flavio Pompermaier
Hi to all, I have a batch dataset and I want to get some standard info about its columns (like min, max, avg etc). In order to achieve this I wrote a simple program that use SQL on table API like the following: SELECT MAX(col1), MIN(col1), AVG(col1), MAX(col2), MIN(col2), AVG(col2), MAX(col3),

Re: Flink - Metric are not reported

2018-11-28 Thread Chesnay Schepler
Yes, you can override notifyOnAddedMetric/notifyOnRemovedMetric to ensure that a metric is logged at least once. On 28.11.2018 16:01, bastien dine wrote: Yea, that was I was thinking.. Batch can be quick Can I report metric on "added" action ? (i should override the notifyOnAddedMetric to

Re: Flink - Metric are not reported

2018-11-28 Thread bastien dine
Yea, that was I was thinking.. Batch can be quick Can I report metric on "added" action ? (i should override the notifyOnAddedMetric to report ?) -- Bastien DINE Data Architect / Software Engineer / Sysadmin bastiendine.io Le mer. 28 nov. 2018 à 15:54, Chesnay Schepler a écrit

Re: Flink - Metric are not reported

2018-11-28 Thread Chesnay Schepler
How quick does job batch terminate? Metrics are unregistered once the job ends; if the job duration is shorter than the report interval they may never be exposed. On 28.11.2018 15:18, bastien dine wrote: Hello Chesnay, Thanks for your response ! I have logs enable (info), slf4jReporter is

Re: Flink - Metric are not reported

2018-11-28 Thread bastien dine
Hello Chesnay, Thanks for your response ! I have logs enable (info), slf4jReporter is working, I can see : 15:16:00.112 [Flink-MetricRegistry-thread-1] INFO org.apache.flink.metrics.slf4j.Slf4jReporter - === Starting metrics report === --

Re: Custom scheduler in Flink

2018-11-28 Thread Felipe Gutierrez
Thanks, I'll check it out. *--* *-- Felipe Gutierrez* *-- skype: felipe.o.gutierrez* *--* *https://felipeogutierrez.blogspot.com * On Wed, Nov 28, 2018 at 2:44 PM Chesnay Schepler wrote: > There's no *reasonable *way to implement a custom Scheduler,

Re: flink hadoop 3 integration plan

2018-11-28 Thread Chesnay Schepler
We certainly want to look into hadoop3 support for 1.8, but we'll have to take a look at the changes to hadoop2 first before we can give any definitive answer. On 28.11.2018 07:41, Ming Zhang wrote: Hi All, now we plan to move CDH6 which is based on hadoop3, anyone knows the plan of flink

Re: Custom scheduler in Flink

2018-11-28 Thread Chesnay Schepler
There's no /reasonable /way to implement a custom Scheduler, i.e., something where can just plug in your scheduler in a nice way. For this you'll have to directly modify the source of Flink. The work in https://issues.apache.org/jira/browse/FLINK-8886 may also be of interest, but is still in

Re: MapState - TypeSerializer

2018-11-28 Thread Andrey Zagrebin
Hi Alexey, it is written once per state name in its meta information, apart from user data entries. Best, Andrey > On 28 Nov 2018, at 04:56, Alexey Trenikhun wrote: > > Hello, > Flink documentation states that “TypeSerializers and > TypeSerializerConfigSnapshots are written as part of

Re: Store Predicate or any lambda in MapState

2018-11-28 Thread Andrey Zagrebin
Also be aware about updating version of library (its jar). If you rebuild code containing lambda, it can change its class name. Upon recover, Kryo might read state with the old class name and cannot find it any more then. I would rather save something which is easily (de)serialisable in state

Re: Flink operator UUID and serialVersionUID

2018-11-28 Thread Jayant Ameta
Makes sense. Thanks Jiayi!! Regards, Jayant On Wed, Nov 28, 2018 at 3:32 PM bupt_ljy wrote: > Hi, > > It will only affect the states. Because Flink will serialize these states > when doing snapshot, therefore, we need to make sure that the states can be > deserialized if the program is

Re: Flink operator UUID and serialVersionUID

2018-11-28 Thread bupt_ljy
Hi, It will only affect the states. Because Flink will serialize these states when doing snapshot, therefore, we need to make sure that the states can be deserialized if the program is restarted. As for the stateless operators, it’s not affected because Flink won’t do a snapshot on them.

Re: Flink operator UUID and serialVersionUID

2018-11-28 Thread Jayant Ameta
Thanks, I'll look into the bravo project. Will it just impact the MapState or all the operators? If I have a map operator which converts DataStream to DataStream>. Will this fail to recover as well if a field is added to MyObject? Jayant Ameta On Wed, Nov 28, 2018 at 3:08 PM bupt_ljy wrote: >

Re: Flink operator UUID and serialVersionUID

2018-11-28 Thread bupt_ljy
Hi, It’ll fail because Flink can’t successfully deserialize the data in savepoint into the new “MyObject” class. There is no official way to fix this problem. However, you can take a look at the bravo projecthttps://github.com/king/bravo, which can help to reconstruct the savepoint, but only

Re: FlinkCEP and scientific papers ?

2018-11-28 Thread Dawid Wysakowicz
Hi Esa, Flink 1.7 will come with an initial support for MATCH_RECOGNIZE clause which integrates Flink CEP library with SQL. You can find the documentation for that feature in the SNAPSHOT docs: https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/match_recognize.html Best,

Custom scheduler in Flink

2018-11-28 Thread Felipe Gutierrez
Hi, I want to develop a custom scheduler in Flink to be aware for which host Flink must process some task. This post shows (using Apache Storm) the kind of example I want to build ( https://inside.edited.com/taking-control-of-your-apache-storm-cluster-with-tag-aware-scheduling-b605e37e ). I

Re: Flink operator UUID and serialVersionUID

2018-11-28 Thread Jayant Ameta
If I upgrade my flink job, and add a field in "MyObject" class. Will the restore fail? If so, how to handle such scenarios? Should I convert the "MyObject" instance in json and store the string? Jayant Ameta On Wed, Nov 28, 2018 at 1:26 PM bupt_ljy wrote: > Hi Jayant, > >If you change the

Questions about UDTF in flink SQL

2018-11-28 Thread wangsan
Hi all, When using user-defined table function in Flink SQL, it seems that the result type of a table function must be determinstic. If I want a UDTF whose result type is determined by its input parameters, what should I do? What I want to do is like this: ``` SELECT input, f1, f2 length

RE: FlinkCEP and scientific papers ?

2018-11-28 Thread Esa Heikkinen
Hi What is the situation of SQL/CEP ? Flink 1.7 is coming soon, but what about SQL/CEP ? Whether it already exists documentation of SQL/CEP ? Event better if there were scientific papers about it ? BR Esa From: Timo Walther Sent: Tuesday, August 7, 2018 4:52 PM To: user@flink.apache.org

Re: OutOfMemoryError while doing join operation in flink

2018-11-28 Thread Fabian Hueske
Hi, Flink handles large data volumes quite well, large records are a bit more tricky to tune. You could try to reduce the number of parallel tasks per machine (#slots per TM, #TMs per machine) and/or increase the amount of available JVM memory (possible in exchange for managed memory as Zhijiang

Dulicated messages in kafka sink topic using flink cancel-with-savepoint operation

2018-11-28 Thread Nastaran Motavali
Hi, I have a flink streaming job implemented via java which reads some messages from a kafka topic, transforms them and finally sends them to another kafka topic. The version of flink is 1.6.2 and the kafka version is 011. I pass the Semantic.EXACTLY_ONCE parameter to the producer. The problem