Re: batch job OOM

2020-01-23 Thread Fanbin Bu
Jingsong, Do you have any suggestions to debug the above mentioned IndexOutOfBoundsException error? Thanks, Fanbin On Wed, Jan 22, 2020 at 11:07 PM Fanbin Bu wrote: > I got the following error when running another job. any suggestions? > > Caused by: java.lang.IndexOutOfBoundsException > at >

debug flink in intelliJ on EMR

2020-01-23 Thread Fanbin Bu
Hi, I m following https://cwiki.apache.org/confluence/display/FLINK/Remote+Debugging+of+Flink+Clusters to debug flink program running on EMR. how do I specify the host in the `edit configurations` if the terminal on emr master is hadoop@ip-10-200-46-186 ? Thanks, Fanbin

Re: Apache Flink - Sharing state in processors

2020-01-23 Thread M Singh
Thanks Yun for your answers. By processor I did mean user defined processor function. Keeping that in view, do you have any advice on how the shared state - ie, the parameters passed to the processor as mentioned above (not the key state or operator state) will be affected in a distributed runt

Re: Flink ParquetAvroWriters Sink

2020-01-23 Thread Arvid Heise
The issue is that your are not providing any meaningful type information, so that Flink has to resort to Kryo. You need to extract the schema during query compilation (in your main) and pass it to your deserialization schema. public TypeInformation getProducedType() { return (TypeInformation

Re: Flink ParquetAvroWriters Sink

2020-01-23 Thread aj
Hi Arvid, I am not clear with this " Note that I still recommend to just bundle the schema with your Flink application and not reinvent the wheel." Can you please help with some sample code on how it should be written. Or can we connect some way so that I can understand with you . On Thu, Jan 2

Re: Usage of KafkaDeserializationSchema and KafkaSerializationSchema

2020-01-23 Thread Chesnay Schepler
That's a fair question; the interface is indeed weird in this regard and does have some issues. From what I can tell you have 2 options: a) have the user pass the topic to the serialization schema constructor, which in practice would be identical to the topic they pass to the producer. b) Addit

Re: Apache Flink - Sharing state in processors

2020-01-23 Thread Chesnay Schepler
1. a/b) No, they are deserialized into separate instances in any case and are independent afterwards. 2. a/b) No, see 1). 3. a/b) No, as individual tasks are isolated by different class-loaders. On 23/01/2020 09:25, M Singh wrote: Thanks Yun for your answers. By processor I did mean user def

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-23 Thread Somya Maithani
Hey, Any ideas about this? We are blocked on the upgrade because we want async timer checkpointing. Regards, Somya Maithani Software Developer II Helpshift Pvt Ltd On Fri, Jan 17, 2020 at 10:37 AM Somya Maithani wrote: > Hey Team, > > *Problem* > Recently, we were trying to upgrade Flink inf

Re: Usage of KafkaDeserializationSchema and KafkaSerializationSchema

2020-01-23 Thread Aljoscha Krettek
Hi, the reason the new schema feels a bit weird is that it implements a new paradigm in a FlinkKafkaProducer that still follows a somewhat older paradigm. In the old paradigm, partitioning and topic where configured on the sink, which made it fixed for all produced records. The new schema all

Re: batch job OOM

2020-01-23 Thread Jingsong Li
Fanbin, I have no idea now, can you created a JIRA to track it? You can describe complete SQL and some data informations. Best, Jingsong Lee On Thu, Jan 23, 2020 at 4:13 PM Fanbin Bu wrote: > Jingsong, > > Do you have any suggestions to debug the above mentioned > IndexOutOfBoundsException err

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-23 Thread Chesnay Schepler
@gordon Do you remember whether we changed any behavior of the Kafka 0.10 consumer after 1.3.3? On 23/01/2020 12:02, Somya Maithani wrote: Hey, Any ideas about this? We are blocked on the upgrade because we want async timer checkpointing. Regards, Somya Maithani Software Developer II Helps

REST rescale with Flink on YARN

2020-01-23 Thread Vasily Melnik
Hi all. I'm using Flink 1.8 on YARN with CDH 5.12 When i try to perform rescale request: curl -v -X PATCH '/proxy/application_1576854986116_0079/jobs/11dcfc3163936fc019e049fc841b075b/rescaling?parallelism=3

Re: REST rescale with Flink on YARN

2020-01-23 Thread Chesnay Schepler
Older versions of Jetty don't support PATCH requests. You will either have to update it or create a custom Flink version that uses POST for the rescale operation. On 23/01/2020 13:23, Vasily Melnik wrote: Hi all. I'm using Flink 1.8 on YARN with CDH 5.12 When i try to perform rescale request:

Re: Old offsets consumed from Kafka after Flink upgrade to 1.9.1 (from 1.2.1)

2020-01-23 Thread Tzu-Li (Gordon) Tai
Hi Somya, I'll have to take a closer look at the JIRA history to refresh my memory on potential past changes that caused this. My first suspection is this: It is expected that the Kafka consumer will *ignore* the configured startup position if the job was restored from a savepoint. It will always

Re: REST rescale with Flink on YARN

2020-01-23 Thread Vasily Melnik
Hi all, I've found some solution for this issue. Problem is that with YARN ApplicationMaster URL we communicate with JobManager via proxy which is implemented on Jetty 6 (for Hadoop 2.6). So to use PATCH method we need to locate original JobManager URL. Using /jobmanager/config API we could get onl

Re: where does flink store the intermediate results of a join and what is the key?

2020-01-23 Thread BenoƮt Paris
Hi all! @Jark, out of curiosity, would you be so kind as to expand a bit on "Query on the intermediate state is on the roadmap"? Are you referring to working on QueryableStateStream/QueryableStateClient [1], or around "FOR SYSTEM_TIME AS OF" [2], or on other APIs/concepts (is there a FLIP?)? Chee

PostgreSQL JDBC connection drops after inserting some records

2020-01-23 Thread Soheil Pourbafrani
Hi, I have a peace of Flink Streaming code that reads data from files and inserts them into the PostgreSQL table. After inserting 6 to 11 million records, I got the following errors: *Caused by: java.lang.RuntimeException: Execution of JDBC statement failed. at org.apache.fli

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-23 Thread Senthil Kumar
Could you tell us how to turn on debug level logs? We attempted this (on driver) sudo stop hadoop-yarn-resourcemanager followed the instructions here https://stackoverflow.com/questions/27853974/how-to-set-debug-log-level-for-resourcemanager and sudo start hadoop-yarn-resourcemanager but we s

Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)

2020-01-23 Thread Aaron Langford
When creating your cluster, you can provide configurations that EMR will find the right home for. Example for the aws cli: aws emr create-cluster ... --configurations '[{ > "Classification": "flink-log4j", > "Properties": { > "log4j.rootLogger": "DEBUG,file" > } > },{ > "Cl

FileStreamingSink is using the same counter for different files

2020-01-23 Thread Pawel Bartoszek
Hi, Flink Streaming Sink is designed to use global counter when creating files to avoid overwrites. I am running Flink 1.8.2 with Kinesis Analytics (managed flink provided by AWS) with bulk writes (rolling policy is hardcoded to roll over on checkpoint). My job is configured to checkpoint every m

How to debug a job stuck in a deployment/run loop?

2020-01-23 Thread Jason Kania
I am attempting to migrate from 1.7.1 to 1.9.1 and I have hit a problem where previously working jobs can no longer launch after being submitted. In the UI, the submitted jobs show up as deploying for a period, then go into a run state before returning to the deploy state and this repeats regula

BlinkPlanner limitation related clarification

2020-01-23 Thread RKandoji
Hi Team, I've been using Blink Planner and just came across this page https://ci.apache.org/projects/flink/flink-docs-stable/release-notes/flink-1.9.html#known-shortcomings-or-limitations-for-new-features and saw below limitation: Due to a bug with how transformations are not being cleared on exe

Re: BlinkPlanner limitation related clarification

2020-01-23 Thread Jingsong Li
Hi RKandoji, IMO, yes, you can not reuse table env, you should create a new tEnv after executing, 1.9.1 still has this problem. Related issue is [1], fixed in 1.9.2 and 1.10. [1] https://issues.apache.org/jira/browse/FLINK-13708 Best, Jingsong Lee On Fri, Jan 24, 2020 at 11:14 AM RKandoji wrot