Re: Experiencing different watermarking behaviour in local/standalone modes when reading from kafka topic with idle partitions

2020-10-01 Thread Salva Alcántara
Awesome David, thanks for clarifying! -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Poor performance with large keys using RocksDB and MapState

2020-10-01 Thread Yun Tang
Hi The option of 'setCacheIndexAndFilterBlocks' is used to ensure we could manage the memory usage of RocksDB, could you share logs or more descriptions why setCacheIndexAndFilterBlocks seems to make the hash index not work properly? I guess this might due to the index and filter is more likely

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-10-01 Thread Yang Wang
3. Make sense to me. And we could add a new HA solution "StatefulSet + PV + FileSystem" at any time if we need in the future. Since there are no more open questions, I will start the voting now. Thanks all for your comments and feedback. Feel feel to continue the discussion if you get other concer

Flink 1.12 cannot handle large schema

2020-10-01 Thread Lian Jiang
Hi, I am using Flink 1.12 snapshot built on my machine. My job throws an exception when writeUTF a schema from the schema registry. Caused by: java.io.UTFDataFormatException: encoded string too long: 223502 bytes at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364) at java.io.DataOutpu

Re: Flink 1.12 snapshot throws ClassNotFoundException

2020-10-01 Thread Lian Jiang
Thanks Till. Making the scala version consistent using 2.11 solved the ClassNotFoundException. On Tue, Sep 29, 2020 at 11:58 PM Till Rohrmann wrote: > Hi Lian, > > I suspect that it is caused by an incompatible Akka version. Flink uses > Akka 2.5.21 instead of 2.5.12. Moreover, you are mixing F

autoCommit for postgres jdbc streaming in Table/SQL API

2020-10-01 Thread Dylan Forciea
Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API. I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connec

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
Hey Till, Just a quick question on time characteristics -- this should work for IngestionTime as well, correct? Is there anything special I need to do to have the CsvTableSource/ toRetractStream call to carry through the assigned timestamps, or do I have to re-assign timestamps during the conversi

RE: Blobserver dying mid-application

2020-10-01 Thread Hailu, Andreas
@Chesnay: I see. I actually had a separate thread with Robert Metzger ago regarding connection-related issues we’re plagued with at higher parallelisms, and his guidance lead us to look into our somaxconn config. We increased it from 128 to 1024 in early September. We use the same generic JAR fo

Re: Experiencing different watermarking behaviour in local/standalone modes when reading from kafka topic with idle partitions

2020-10-01 Thread David Anderson
If you were to use per-partition watermarking, which you can do by calling assignTimestampsAndWatermarks directly on the Flink Kafka consumer [1], then I believe the idle partition(s) would consistently hold back the overall watermark. With per-partition watermarking, each Kafka source task will a

Re: Blobserver dying mid-application

2020-10-01 Thread Chesnay Schepler
All jobs running in a Flink session cluster talk to the same blob server. The time when tasks are submitted depends on the job; for streaming jobs all tasks are deployed when the job starts running; in case of batch jobs the submission can be staggered. I'm only aware of 2 cases where we tran

Re: Stateful Functions + ML model prediction

2020-10-01 Thread John Morrow
Hi Flink Users, I was watching Tzu-Li Tai's talk on stateful functions from Flink Forward (https://www.youtube.com/watch?v=tuSylBadNSo) which mentioned that Kafka & Kinesis are supported, and looking at https://repo.maven.apache.org/maven2/org/apache/flink/ I can see IO packages for those two:

RE: Blobserver dying mid-application

2020-10-01 Thread Hailu, Andreas
Hi Chesnay, Till, thanks for responding. @Chesnay: Apologies, I said cores when I meant slots ☺ So a total of 26 Task managers with 2 slots each for a grand total of 52 parallelism. @Till: For this application, we have a grand total of 78 jobs, with some of them demanding more parallelism than

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Austin Cawley-Edwards
Perfect, thanks so much Till! On Thu, Oct 1, 2020 at 5:13 AM Till Rohrmann wrote: > Hi Austin, > > I believe that the problem is the processing time window. Unlike for event > time where we send a MAX_WATERMARK at the end of the stream to trigger all > remaining windows, this does not happen for

Experiencing different watermarking behaviour in local/standalone modes when reading from kafka topic with idle partitions

2020-10-01 Thread Salva Alcántara
I am considering this watermarker: ```scala class MyWatermarker(val maxTimeLag: Long = 0) extends AssignerWithPeriodicWatermarks[MyEvent] { var maxTs: Long = 0 override def extractTimestamp(e: MyEvent, previousElementTimestamp: Long): Long = { val timestamp = e.timestamp maxTs = m

Re: Blobserver dying mid-application

2020-10-01 Thread Chesnay Schepler
It would also be good to know how many slots you have on each task executor. On 10/1/2020 11:21 AM, Till Rohrmann wrote: Hi Andreas, do the logs of the JM contain any information? Theoretically, each task submission to a `TaskExecutor` can trigger a new connection to the BlobServer. This depe

Re: Blobserver dying mid-application

2020-10-01 Thread Till Rohrmann
Hi Andreas, do the logs of the JM contain any information? Theoretically, each task submission to a `TaskExecutor` can trigger a new connection to the BlobServer. This depends a bit on how large your TaskInformation is and whether this information is being offloaded to the BlobServer. What you ca

Re: Streaming SQL Job Switches to FINISHED before all records processed

2020-10-01 Thread Till Rohrmann
Hi Austin, I believe that the problem is the processing time window. Unlike for event time where we send a MAX_WATERMARK at the end of the stream to trigger all remaining windows, this does not happen for processing time windows. Hence, if your stream ends and you still have an open processing tim

Re: [DISCUSS] FLIP-144: Native Kubernetes HA for Flink

2020-10-01 Thread Till Rohrmann
3. We could avoid force deletions from within Flink. If the user does it, then we don't give guarantees. I am fine with your current proposal. +1 for moving forward with it. Cheers, Till On Thu, Oct 1, 2020 at 2:32 AM Yang Wang wrote: > 2. Yes. This is exactly what I mean. Storing the HA infor