Flink Savepoint error

2020-02-14 Thread Soheil Pourbafrani
Hi, I developed a Flink application that read data from files and inserts them into the database. During the job running, I attempted to get a savepoint and cancel the job but I got the following error: Caused by: java.util.concurrent.CompletionException: >

TaskManager Fail when I cancel the job and crash

2020-02-14 Thread Soheil Pourbafrani
Hi, I developed a single Flink job that read a huge amount of files and after some simple preprocessing, sink them into the database. I use the built-in JDBCOutputFormat for inserting records into the database. The problem is when I cancel the job using either the WebUI or the command line, the

Re: Flink solution for having shared variable between task managers

2020-02-03 Thread Soheil Pourbafrani
Am Fr., 17. Jan. 2020 um 14:50 Uhr schrieb Soheil Pourbafrani < > soheil.i...@gmail.com>: > >> Hi, >> >> According to the processing logic, I need to have a HashMap variable that >> should be shared between the taskmanagers. The scenario is the HashMap data >&g

fliter and flatMap operation VS only a flatMap operation

2020-01-29 Thread Soheil Pourbafrani
Hi, In case we need to filter operation followed by a transformation, which one is more efficient in Flink, applying the filter operation first and then a flatMap operation separately OR using only a flatMap operation that internally includes the filter logic, too? best Soheil

PostgreSQL JDBC connection drops after inserting some records

2020-01-23 Thread Soheil Pourbafrani
Hi, I have a peace of Flink Streaming code that reads data from files and inserts them into the PostgreSQL table. After inserting 6 to 11 million records, I got the following errors: *Caused by: java.lang.RuntimeException: Execution of JDBC statement failed. at

Re: Flink configuration on Docker deployment

2020-01-22 Thread Soheil Pourbafrani
y, it is > located in the path "/opt/flink/conf". > Docker volume also could be used to override the flink configuration when > you start the jobmanager > and taskmanager containers[1]. > > Best, > Yang > > [1]. https://docs.docker.com/storage/volumes/ > > Sohe

Flink configuration on Docker deployment

2020-01-21 Thread Soheil Pourbafrani
Hi, I need to set up a Flink cluster using the docker(and not using the docker-compose). I successfully could strat the jobmanager and taskmanager but the problem is I have no idea how to change the default configuration for them. For example in the case of giving 8 slots to the taskmanager or

Does Flink Metrics provide information about each records inserted into the database

2020-01-18 Thread Soheil Pourbafrani
Hi, I'm using Flink to insert some processed records into the database. I need to have some aggregated information about records inserted into the database so far. For example, for a specific column value, I need to know how many records have been inserted. Can I use the Flink Matrics to provide

Flink solution for having shared variable between task managers

2020-01-17 Thread Soheil Pourbafrani
Hi, According to the processing logic, I need to have a HashMap variable that should be shared between the taskmanagers. The scenario is the HashMap data will be continuously updated according to the incoming stream of data. What I observed is declaring the HashMap variable as a class attribute,

How to declare the Row object schema

2020-01-16 Thread Soheil Pourbafrani
Hi, Inserting a DataSet of the type Row using the Flink *JDBCOutputFormat *I continuously go the warning: [DataSink (org.apache.flink.api.java.io.jdbc.JDBCOutputFormat@18be83e4) (1/4)] WARN org.apache.flink.api.java.io.jdbc.JDBCOutputFormat - Unknown column type for column 8. Best effort approach

Re: Flink Batch mode checkpointing

2020-01-16 Thread Soheil Pourbafrani
we can > enable some checkpoint future when dealing batch cases. > > > [1] https://flink.apache.org/roadmap.html#batch-and-streaming-unification > > Best > Yun Tang > > ------ > *From:* Soheil Pourbafrani > *Sent:* Friday, January 17, 2020

Flink Batch mode checkpointing

2020-01-16 Thread Soheil Pourbafrani
Hi, While in Streaming mode I'm using the Flink checkpointing and restart strategy, I could not find any checkpointing or restart strategy for Batch mode! Does Flink have any support for that? Actually I'm gonna read some huge text files and I need the application to be at least restarted on any

Read CSV file and and create customized field

2020-01-16 Thread Soheil Pourbafrani
Hi friends, I'm going to read a CSV file that has 3 columns. I want the final loaded datatype to have other columns inferred by that 3 columns. For example, I would split the first column of the CSV file and create 3 new columns. The problem is I did not find a straightforward approach for that.

How Flink read files from local filesystem

2020-01-15 Thread Soheil Pourbafrani
Hi, Suppose we have a Flink single node cluster with multiple slots and some input files exist in local file system. In this case where we have no distributed file system to dedicate each file's block to taskmanagers, how Flink will read the file? Do all the task managers will open the file

Writing Flink logs into specific file

2019-07-18 Thread Soheil Pourbafrani
Hi, When we run the Flink application some logs will be generated about the running, in both local and distributed environment. I was wondering if is it possible to save logs into a specified file? I put the following file in the resource directory of the project but it has no effect:

Creating a Source function to read data continuously

2019-07-15 Thread Soheil Pourbafrani
Hi, Extending the "RichInputFormat" class I could create my own MySQL input. I want to use it for reading data continuously from a table but I observed that the "RichInputFormat" class read all data and finish the job. I guess for reading data continuously I need to extend the "SourceFunction"

How to create Row with RowTypeInfo

2019-07-13 Thread Soheil Pourbafrani
Hi Creating a new DataSet of type Row, how can I the RowTypeInfo of the row? For example when I create a new dataset like the following: Row row = Row.of(1, new Timestamp(1), new Date(1)); System.out.println(env.fromElements(row).getType()); it results in: Row(f0: Integer, f1: Timestamp, f2:

Cannot write DataSet as csv file

2019-07-06 Thread Soheil Pourbafrani
Hi, Using the JDBCInputFormat I loaded a DataSet type. When I tried to save it as CSV file it errors: java.lang.ClassCastException: org.apache.flink.types.Row cannot be cast to org.apache.flink.api.java.tuple.Tuple That's while I can save it as a text file. Here is the code. DataSet dataset =

Load database table with many columns

2019-07-03 Thread Soheil Pourbafrani
Hi, I use the following sample code to load data from a database into Flink DataSet: DataSet dbData = env.createInput( JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")

Can Flink infers the table columns type

2019-07-02 Thread Soheil Pourbafrani
Hi I want load MySQL tables in Flink without need to specifying column names and types (like what we can do in Apache Spark DataFrames). Using the JDBCInputFormat we should pass the table fields type in the method setRowTypeInfo. I couldn't find any way to force Flink to infer the column type.

Flink cluster log organization

2019-05-20 Thread Soheil Pourbafrani
Hi, I have a Flink multinode cluster and I use Flink standalone scheduler to deploy applications on the cluster. When I deploy applications on the cluster I can see some log files on the path FLINK_HOME/logs will be created but there is no separate log file for each application and all

Applying multiple calculation on data aggregated on window

2019-05-15 Thread Soheil Pourbafrani
Hi, Im my environment I need to collect stream of messages into windows based on some fields as key and then I need to do multiple calculations that will apply on specaified messages. for example if i had the following messages on the window: {ts: 1, key: a, value: 10} {ts: 1, key: b, value: 0}

Re: Read data from HDFS on Hadoop3

2019-05-08 Thread Soheil Pourbafrani
UPDATE I noticed that it runs using the IntelliJ IDEA but packaging the fat jar and deploying on the cluster will cause the so-called hdfs scheme error! On Thu, May 9, 2019 at 2:43 AM Soheil Pourbafrani wrote: > Hi, > > I used to read data from HDFS on Hadoop2 by adding the

Read data from HDFS on Hadoop3

2019-05-08 Thread Soheil Pourbafrani
Hi, I used to read data from HDFS on Hadoop2 by adding the following dependencies: org.apache.flink flink-java 1.4.0 org.apache.flink flink-streaming-java_2.11 1.4.0

Flink Load multiple file

2019-04-29 Thread Soheil Pourbafrani
Hi, I want to load multiple file and apply the processing logic on them. After some searches using the following code I can load all the files in the directory named "input" into Flink: TextInputFormat tif = new TextInputFormat(new Path("input")); DataSet raw = env.readFile(tif, "input//"); If

Data Locality in Flink

2019-04-28 Thread Soheil Pourbafrani
Hi I want to exactly how Flink read data in the both case of file in local filesystem and file on distributed file system? In reading data from local file system I guess every line of the file will be read by a slot (according to the job parallelism) for applying the map logic. In reading from

Sinking messages in RabbitMQ

2019-04-23 Thread Soheil Pourbafrani
I'm using Flink RabbitMQ Connector for Sinking Data but using the RMQConnectionConfig object I couldn't find any method to set the type of the exchange (Fanout, Topic, Direct). And also the RMQSink get just name of the queue as the parameter. Is there any way to specify the exchange type?

Flink Customized read text file

2019-04-23 Thread Soheil Pourbafrani
Hi, I want to know is it possible to use PipedInutStream and PipedOutputStream in Flink for reading text data from a file? For example extending a RichSourceFunction for it and readata like this: DataStream raw = env.addSource(new PipedSource(file_path)); Actually i tried to implement a class

Create Custom Sink for DataSet

2019-04-21 Thread Soheil Pourbafrani
Hi, Using the DataStream API I could create a Custom Sink like classRichMySqlSink extends RichSinkFunction and define my desire behavior in inserting data into mysql table. But using the DataSet API I just can find the output method for sinking data and it accept just OutputFormat data type. In

Create Dynamic data type

2019-04-19 Thread Soheil Pourbafrani
Hi, Using JDBCInputFormat I want to read data from database but the problem is the table columns are dynamic according to the number of rows. In the schema the first column is of type int and in the rest of the column the first half is String and the second half is double. So I need a way to

Sink data into java stream variable

2019-04-15 Thread Soheil Pourbafrani
Hi, In Flink Stream processing can we sink data into java stream array?

Event Trigger in Flink

2019-04-12 Thread Soheil Pourbafrani
Hi, In my problem I should Process Kafka messages Using Apache Flink, while some processing parameters should be read from the CouchDB, So I have two questions: 1- What is Flink way to read data from the CouchDB? 2- I want to trigger Flink to load data from the Couch DB if a new document was

Case When in Flink Table API

2019-01-29 Thread Soheil Pourbafrani
How can I use the correct way of *Case When *this example: myTlb.select( "o_orderdate.substring(0,4) as o_year, volume, (when(n_name.like('%BRAZIL%'),volume).else(0)) as case_volume" ) Flink errors on the line (when(n_name.like('%BRAZIL%'),volume).else(0)) as case_volume"

How to save table with header

2019-01-29 Thread Soheil Pourbafrani
Hi, I can save tables in a CSV file like this: TableSink q6Sink = new CsvTableSink(SinkPath, ","); temp.writeToSink(q6Sink); but I want to save the table with the table header as the first line. Is it possible in Flink?

Select feilds in Table API

2019-01-29 Thread Soheil Pourbafrani
Hi, I'm trying select some fields: lineitem .select( "l_returnflag," + "l_linestatus," + "l_quantity.sum as sum_qty," + "(l_extendedprice * (l_discount - 1)).sum as sum_disc_price," + "l_extendedprice.sum as

Filter Date type in Table API

2019-01-29 Thread Soheil Pourbafrani
Hi, I want to filter a field of type Date (Java.sql.Date) like the following: filter("f_date <= '1998-10-02'") and filter("f_date <= '1998/10/02'") Expression 'f_date <= 1998/10/02 failed on input check: Comparison is only supported for numeric types and comparable types of same type, got Date

Flink Table API Sum method

2019-01-29 Thread Soheil Pourbafrani
Hi, How can I use the Flink Table API SUM function? For example something like this: table.agg(sum("feild1"))

date format in Flink SQL

2019-01-29 Thread Soheil Pourbafrani
Hi, I want to convert a string in the format of 1996-8-01 to date and create Table from the dataset of Tuple3 at the end. Since I want to apply SQL queries on the date field of the table, for example, "date_column < 1996-8-01", which java format of date is supported in Flink?

How to infer table schema from Avro file

2019-01-27 Thread Soheil Pourbafrani
Hi, I load an Avro file in a Flink Dataset: AvroInputFormat test = new AvroInputFormat( new Path("PathToAvroFile) , GenericRecord.class); DataSet DS = env.createInput(test); usersDS.print(); and here are the results of printing DS: {"N_NATIONKEY": 14, "N_NAME": "KENYA",

How to load Avro file in a Dataset

2019-01-27 Thread Soheil Pourbafrani
According to the Flink document, it's possible to load Avro file like the following: AvroInputFormat users = new AvroInputFormat(in, User.class);DataSet usersDS = env.createInput(users); It's a bit confusing for me. I guess the User is a predefined class. My question is can Flink detect the Avro

How to create schema for Flink table

2019-01-26 Thread Soheil Pourbafrani
I want to do some transformation on raw data and then create a table from that. Is it possible to create a schema for the table (in advance) and then using that transformed dataset and schema create the table?

Re: Print table contents

2019-01-25 Thread Soheil Pourbafrani
cts/flink/flink-docs-master/dev/table/common.html#convert-a-table-into-a-dataset > > > On Sat, Jan 26, 2019 at 3:24 AM Soheil Pourbafrani > wrote: > >> Hi, >> >> Using Flink Table object how can we print table contents, something like >> Spark show() method? >

Print table contents

2019-01-25 Thread Soheil Pourbafrani
Hi, Using Flink Table object how can we print table contents, something like Spark show() method? for example in the following: tableEnv.registerDataSet("Orders", raw, "id, country, num, about"); Table results = tableEnv.sqlQuery("SELECT id FROM Orders WHERE id > 10"); How can I print the

Count sliding window does not work as expected

2018-08-23 Thread Soheil Pourbafrani
Hi, I need some sliding windowing strategy that fills the window with the count of 400 and for every 100 incoming data, process the last 400 data. For example, suppose we have a data stream of count 16. For count window of 400 and sliding of 100, I expect it output 1597 stream: 16 - 400

Re: Flink socketTextStream UDP connection

2018-08-20 Thread Soheil Pourbafrani
SourceFunction [1] > and adapt it to your needs. > > Best, Fabian > > [1] > https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java > > 2018-08-13 10:04 GMT+02:00 Soheil Po

Flink socketTextStream UDP connection

2018-08-13 Thread Soheil Pourbafrani
Flink socketTextStream received data using the TCP protocol. Is there any way to get data using the UDP protocol?

Event time didn't advance because of some idle slots

2018-07-31 Thread Soheil Pourbafrani
In Flink Event time mode, I use the periodic watermark to advance event time. Every slot extract event time from the incoming message and to emit watermark, subtract it a network delay, say 3000ms. public Watermark getCurrentWatermark() { return new Watermark(MAX_TIMESTAMP - DELEY);

Detect late data in processing time

2018-07-30 Thread Soheil Pourbafrani
In Event Time, we can gather bad data using OutputTag, because in Event Time we have Watermark and we can detect late data. But in processing time mode we don't have any watermark to detect bad data. I want to know can we set watermark (for example according to taskmanager's timestamp) and use

watermark VS window trigger

2018-07-30 Thread Soheil Pourbafrani
Suppose we have a time window of 10 milliseconds and we use EventTime. First, we determine how Flink can get time and watermark from incoming messages, after that, we set a key for the stream and set a time window. aggregatedTuple .assignTimestampsAndWatermarks(new

Flink wrong Watermark in Periodic watermark

2018-07-30 Thread Soheil Pourbafrani
Using Flink EventTime feature, I implement the class AssignerWithPeriodicWatermark such that: public static class SampleTimestampExtractor implements AssignerWithPeriodicWatermarks> { private static final long serialVersionUID = 1L; private long MAX_TIMESTAMP; private final long DELEY

Why data didn't enter the time window in EventTime mode

2018-07-18 Thread Soheil Pourbafrani
Hi, In a datastream processing problem, the source generated data every 8 millisecond and timestamp is a field of the data. In default Flink time behavior data enter the time window but when I set Flink time to EventTime it will output nothing! Here is the code:

clear method on Window Trigger

2018-07-17 Thread Soheil Pourbafrani
Hi, Can someone elaborate on when the clear method on class Trigger will be called and what is the duty of that? Also, I don't know what is the benefit of FIRE_AND_PURGE against FIRE and it's use case. For example, in a scenario, if we have a count of 3 Window that also will trigger after a

Can not get OutPutTag datastream from Windowing function

2018-07-17 Thread Soheil Pourbafrani
Hi, according to the documents I tried to get late data using side output. final OutputTag> lateOutputTag = new OutputTag>("late-data"){}; DataStream> res = aggregatedTuple .assignTimestampsAndWatermarks(new Bound())

method meaning of class Trigger

2018-07-16 Thread Soheil Pourbafrani
Hi, In extending class Trigger we have methods like onElement, onProcessingTime and onEventTime. I know the method onElement will be called when a element is added to the window, but I have no idea about when two other methods onProcessingTime and onEventTime will be called?

How to customize trigger for Count Time Window

2018-07-14 Thread Soheil Pourbafrani
I want to have a time window to trigger data processing in two following condition: 1 - The window has 3 messages 2- Or any number of message (less than 3) is in the window and it reaches a timeout I know someone should extend Trigger class: public static class MyWindowTrigger extends Trigger {

reduce a data stream to other type

2018-07-14 Thread Soheil Pourbafrani
Hi, I have a keyed datastream in the type of Tuple2. I want to reduce it and merge all of the byte[] for a key. (the first filed (Long) is the key). So I need reduce function return the type Tuple2>, but reduce function didn't allow that! How can I do such job in Flink?

Filtering and mapping data after window opertator

2018-07-14 Thread Soheil Pourbafrani
Hi, I'm getting data stream from a source and after gathering data in a time window I want to do some operation like filtering and mapping on windowed data, but the output of time window operation just allow reduce, aggregate or ... function and after that, I want to apply functions like filter or

TimeWindow doesn't trigger reduce function

2018-07-13 Thread Soheil Pourbafrani
Hi, My stream data is in a type of Tuple2 that contains the timestamp (in second) and data, respectively. The source will generate 120 sample every second. Using the following code I want to get data in every second and then apply the reduce function on them. temp.keyBy(

Pass a method as parameter

2018-07-07 Thread Soheil Pourbafrani
Is it possible in Flink to write a method A that we can give it a written function's name B and function A will apply function B to DataStream?

Is it reasonable to use Flink in a local machine

2018-06-24 Thread Soheil Pourbafrani
Hi, My purpose is to generate data in a process and process those data in another process, something like stream processing, but all done in just one node, not cluster! The rate of generating data is 12000 sample per second. I want the processing phase to done in parallel, So as Flink use node

What is the package in Flink-Cassandra-connector, includes Cassandra datastax core

2018-05-29 Thread Soheil Pourbafrani
Hi, I use Flink Cassandra Connector dependency in my maven project. Other components have conflict with the cassandra-driver-core that is embedded in flink-cassandra-connector. I tried to exclude that in pom.xml file like this: org.apache.flink flink-connector-cassandra_2.11 1.4.2

Submitting Flink application on YARN parameter

2018-04-28 Thread Soheil Pourbafrani
Hi, I have an Flink .jar file and I submit it on yarn cluster using the command: flink run -m yarn-cluster -yn 5 -yjm 768 -ytm 1400 -ys 2 -yqu streamQ my_program.jar According to the submitting command I expect: // It will create 5 containers > satisfied // each container should use 2 core

Flink flatMap to pass a tuple and get multiple tuple

2018-04-27 Thread Soheil Pourbafrani
Hi, I want to use flatMap to pass to function namely 'parse' a tuple and it will return multiple tuple, that each should be a record in datastream object. Something like this: DataStream> res = stream.flatMap(new FlatMapFunction, Tuple2>()

Re: Insert data into Cassandra without Flink Cassandra connection

2018-04-26 Thread Soheil Pourbafrani
ln("\n*** New Message ***\n"); System.out.println("Row Number : " + i ++ ); System.out.println("Message: " + HexUtiles.bytesToHex(value)); Parser.parse(ByteBuffer.wrap(value), ConfigHashMap); } }); On Thu, Apr 26, 2018 at 5

Insert data into Cassandra without Flink Cassandra connection

2018-04-26 Thread Soheil Pourbafrani
I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data. It's ok running in local mode. The problem is when

Different result on running Flink in local mode and Yarn cluster

2018-04-25 Thread Soheil Pourbafrani
I run a code using *Flink* Java API that gets some bytes from *Kafka* and parses it following by inserting into *Cassandra* database using another library *static* method (both parsing and inserting results is done by the library). Running code on local in IDE, I get the desired answer, but

Applying an void function to DataStream

2018-04-19 Thread Soheil Pourbafrani
Hi, I have a void function that takes a String, parse it and write it into Cassandra (Using pure java, not Flink Cassandra connector). Using Apache Flink Kafka connector, I've got some data into DataStream. Now I want to apply Parse function to each message in DataStream, but as the Parse function

Flink Kafka producer: Object of class is not serializable

2018-03-31 Thread Soheil Pourbafrani
I got an error in using Flink Kafka connector for producing data. I describe the problem here in Stackoverflow. Pls help Thanks!

Reading data from Cassandra

2018-03-31 Thread Soheil Pourbafrani
Does Flink Cassandra Connector support reading data from Cassandra? If yes, so please give an example?

Re: Does Flink support Hadoop (HDFS) 2.9 ?

2018-03-01 Thread Soheil Pourbafrani
I mean Flink 1.4 On Thursday, March 1, 2018, Soheil Pourbafrani <soheil.i...@gmail.com> wrote: > ?

Does Flink support Hadoop (HDFS) 2.9 ?

2018-03-01 Thread Soheil Pourbafrani
?

Flink Yarn session

2018-01-27 Thread Soheil Pourbafrani
I've set up a Hadoop HA cluster and I want to run Flink applications on YARN. The elaborated problem is in StackOverflow: Here is the question Please help!

Flink on YARN

2018-01-20 Thread Soheil Pourbafrani
Hi, I have a YARN cluster(containing no Flink installation) that I want to run Flink application on that. I was wondering if it is needed to install Flink on every node of YARN cluster or not it suffices to install Flink on edge node (the node that I want to submit a job to remote YARN cluster)?

Flink standalone scheduler

2018-01-17 Thread Soheil Pourbafrani
Does Flink standalone scheduler support dynamic resource allocations? Something like Yarn scheduler?

using a Yarn cluster for both Spark and Flink

2018-01-17 Thread Soheil Pourbafrani
Is it standard approach to set up a Yarn cluster for running both Spark and Flink applications?

cleaning yarn logs for long-running applications

2018-01-14 Thread Soheil Pourbafrani
Hi, I want to use Yarn as cluster manager for running Flink applications, but I'm worried about how Flink or Yarn handle local logs in each machine. Does they clean aged logs for a long-running application? If not, it's possible the local storage get full!!!

org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user class: FlinkStreaming

2017-12-16 Thread Soheil Pourbafrani
Hey, I have Flink code passing all its dependencies through ExecutionEnvirnment object. I run my code remotely on cluster and it errors: Exception in thread "main" org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job execution failed. at

Cannot load user classes

2017-12-12 Thread Soheil Pourbafrani
Hey, I wrote a code using Flink and creating fat jar using maven, I can errorlessly run it on a remote cluster. Trying to run it without creating a fat jar and directly from IDE I got the error Cannot load user class for not Flink core classes. For example `Cannot load user class:

using regular expression to specify Kafka topics

2017-12-02 Thread Soheil Pourbafrani
I use Flink Kafka connector 10 to subscribe topics and get data. Now I want to specify topics not using String, but regular expression. I want to do that just because it can recognize future topics added to the Kafka and get their data. Spark Kafka connector has a method named SubscribePattern

Creating flink byte[] deserialiser

2017-11-27 Thread Soheil Pourbafrani
Hi, I want to read(consume) data from Kafka as byte array just something like Kafka byte array deserializer. In Flink I just find SimpleStringSchema and it is note suitable for my data. Is any built-in byte array deserializer in Flink or if not how can I create a simple byte array deserializer?