Re: Regarding Flink Upgrades

2022-11-02 Thread Dawid Wysakowicz
Hi, What you linked to is what the community agreed to support. So far we've been able to support three versions at all times (e.g. currently we merge bugfixes to 1.17.x, 1.16.x, 1.15.x), which is one extra version than what is described in the docs. I don't think this will ever decrease.

Re: Avro deserialization issue

2022-04-14 Thread Dawid Wysakowicz
Hi Anitha, As far as I can tell the problem is with avro itself. We upgraded avro version we use underneath in Flink 1.12.0. In 1.11.x we used avro 1.8.2, while starting from 1.12.x we use avro 1.10.0. Maybe that's the problem. You could try to upgrading the avro version in your program. Just

Re: Interval join operator is not forwarding watermarks correctly

2022-03-17 Thread Dawid Wysakowicz
Hi Alexis, I tried looking into your example. First of all, so far, I've spent only a limited time looking at the WatermarkGenerator, and I have not thoroughly understood how it works. I'd discourage assigning watermarks anywhere in the middle of your pipeline. This is considered to be an

Re: Move savepoint to another s3 bucket

2022-03-08 Thread Dawid Wysakowicz
Hi Lukas, I am afraid you're hitting this bug: https://issues.apache.org/jira/browse/FLINK-25952 Best, Dawid On 08/03/2022 16:37, Lukáš Drbal wrote: Hello everyone, I'm trying to move savepoint to another s3 account but restore always failed with some weird 404 error. We are using lyft

Re: Could not stop job with a savepoint

2022-03-07 Thread Dawid Wysakowicz
Hi, From the exception it seems the job has been already done when you're triggering the savepoint. Best, Dawid On 07/03/2022 14:56, Vinicius Peracini wrote: Hello everyone, I have a Flink job (version 1.14.0 running on EMR) and I'm having this issue while trying to stop a job with a

Re: Question about Flink counters

2022-03-07 Thread Dawid Wysakowicz
Hi Shane, I don't think counters, or should I say metrics, are the right abstraction for the use case you described. Metrics are a way to get an insight into the running job and what is its current state. It is not a good mean to calculate results. Metrics are not stateful, they are not

Re: streaming mode with both finite and infinite input sources

2022-02-25 Thread Dawid Wysakowicz
This should be supported in 1.14 if you enable checkpointing with finished tasks[1], which has been added in 1.14. In 1.15 it will be enabled by default. Best, Dawid [1]

Re: CDC using Query

2022-02-04 Thread Dawid Wysakowicz
Hi Mohan, I don't know much about Kafka Connect, so I will not talk about its features and differences to Flink. Flink on its own does not have a capability to read a CDC stream directly from a DB. However there is the flink-cdc-connectors[1] projects which embeds the standalone Debezium

Re: Reading from Kafka kafkarecorddeserializationschema

2022-02-04 Thread Dawid Wysakowicz
Hi, You can use DeserializationSchema with KafkaSource as well. You can pass it to the KafkaSource.builder():     KafkaSource.<...>builder()     .setDeserializer(...) You can also take a look at the StateMachineExample[1], where KafkaSource is used.

Re: Queryable State Deprecation

2022-02-04 Thread Dawid Wysakowicz
Hi Karthik, The reason we deprecated it is because we lacked committers who could spend time on getting the Queryable state to a production ready state. I might be speaking for myself here, but I think the main use case for the queryable state is to have an insight into the current state of

Re: flink savepoints not (completely) relocatable ?

2022-02-03 Thread Dawid Wysakowicz
! Greetings, Frank On 03.02.22 16:38, Dawid Wysakowicz wrote: Hi Frank. Do you use entropy injection by chance? I am afraid savepoints are not relocatable in combination with entropy injection as described here[1]. Best, Dawid [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops

Re: flink savepoints not (completely) relocatable ?

2022-02-03 Thread Dawid Wysakowicz
Hi Frank. Do you use entropy injection by chance? I am afraid savepoints are not relocatable in combination with entropy injection as described here[1]. Best, Dawid [1] https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#triggering-savepoints On 03/02/2022

Re: create savepoint on bounded source in streaming mode

2022-01-25 Thread Dawid Wysakowicz
Hi Shawn, You could also take a look at the hybrid source[1] Best, Dawid [1]https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ On 26/01/2022 08:39, Guowei Ma wrote: Hi Shawn Currently Flink can not trigger the sp at the end of the input. An

Re: Examples / Documentation for Flink ML 2

2022-01-17 Thread Dawid Wysakowicz
I am adding a couple of people who worked on it. Hopefully, they will be able to answer you. On 17/01/2022 13:39, Bonino Dario wrote: > > Dear List, > > We are in the process of evaluating Flink ML version 2.0 in the > context of some ML task mainly concerned with classification and > clustering.

Re: [statefun] upgrade path - shared cluster use

2022-01-17 Thread Dawid Wysakowicz
I am pretty confident the goal is to be able to run on the newest Flink version. However, as the release cycle is decoupled for both modules it might take a bit. I added Igal to the conversation, who I hope will be able to give you an idea when you can expect that to happen. Best, Dawid On

Re: Flink per-job cluster HbaseSinkFunction fails before starting - Configuration issue

2022-01-17 Thread Dawid Wysakowicz
Hey Kamil, Have you followed this guide to setup kerberos authentication[1]? Best, Dawid [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/security/security-kerberos/ On 14/01/2022 17:09, Kamil ty wrote: > Hello all, > I have a flink job that is using the

Re: Sorting/grouping keys and State management in BATCH mode

2022-01-12 Thread Dawid Wysakowicz
Hey Krzysztof, Re 1. I believe you are asking where the state is kept. It is stored in memory, but bear in mind there is only ever state kept for the current key. Once all records for a key are processed the corresponding state is discarded as it won't be needed anymore. Re 2. The sorting

Re: GenericWriteAheadSink, declined checkpoint for a finished source

2021-12-06 Thread Dawid Wysakowicz
Hi, Sorry to hear it's hard to find the option. It is part of the 1.14 release[1]. It is also documented how to enable it[2]. Happy to hear how we can improve the situation here. As for the exception. Are you seeing this exception occur repeatedly for the same task? I can imagine a situation

Re: Upgrade from 1.13.1 to 1.13.2/1.13.3 failing

2021-11-09 Thread Dawid Wysakowicz
Hey Sweta, Sorry I did not get back to you earlier. Could you explain how do you do the upgrade? Do you try to upgrade your cluster through HA services (e.g. zookeeper)? Meaning you bring down the 1.13.1 cluster down and start a 1.13.2/3 cluster which you intend to pick up the job automatically

Re: Duplicate Calls to Cep Filter

2021-11-01 Thread Dawid Wysakowicz
Hey Puneet, The reason for multiple calls to the condition is as mentioned before, because once it is evaluated for the TAKE and the second time for the IGNORE edge. The reason is that every edge is evaluated independently. There is no joint context or caching of conditions. I agree from a

Re: [DISCUSS] Creating an external connector repository

2021-10-19 Thread Dawid Wysakowicz
Hey all, I don't have much to add to the general discussion. Just a single comment on: that we could adjust the bylaws for the connectors such that we need fewer PMCs to approve a release. Would it be enough to have one PMC vote per connector release? I think it's not an option.

Re: Issue with Flink UI for Flink 1.14.0

2021-10-14 Thread Dawid Wysakowicz
I am afraid it is a bug in flink 1.14. I created a ticket for it FLINK-24550[1]. I believe we should pick it up soonish. Thanks for reporting the issue! Best, Dawid [1] https://issues.apache.org/jira/browse/FLINK-24550 On 13/10/2021 20:32, Peter Westermann wrote: > > Hello, > >   > > I just

Re: a question about flink table catalog.

2021-10-14 Thread Dawid Wysakowicz
k-packages, but I am not aware of any plans. (cc Timo) Best, Dawid On 14/10/2021 15:05, Yuepeng Pan wrote: > Dawid Wysakowicz > >    Thanks for your reply.  Will community to plan to implement this > feature ?  > > > > Best,  > Roc > > > > At 2021-10-14

Re: a question about flink table catalog.

2021-10-14 Thread Dawid Wysakowicz
If I understand your question correctly, you're asking if you can somehow persist the GenericInMemoryCatalog. I am afraid it is not possible. The idea of the GenericInMemoryCatalog is that it is transient and is stored purely in memory. Best, Dawid On 14/10/2021 13:44, Yuepeng Pan wrote: > Hi, 

Re: Exception: SequenceNumber is treated as a generic type

2021-10-14 Thread Dawid Wysakowicz
Hey Ori, As for the SequenceNumber issue, I'd say yes, it can be seen as a bug. In the current state one can not use kinesis consumer with the pipeline.generic-types=false. The problem is because we use the TypeInformation.of(SequenceNumber.class) method, which will in this case always fallback

Re:

2021-10-14 Thread Dawid Wysakowicz
I hope Rui (in cc) will be able to help you. Best, Dawid On 12/10/2021 15:32, Andrew Otto wrote: > Hello, > > I'm trying to use HiveCatalog with Kerberos.  Our Hadoop cluster, our > Hive Metastore, and our Hive Server are kerberized.  I can > successfully submit Flink jobs to Yarn authenticated

Re: Issues while upgrading from 1.12.1 to 1.14.0

2021-10-06 Thread Dawid Wysakowicz
Hi Parag, When you restore from a savepoint do you see a line like: "Restoring job {} from {}" in jobmanagers logs? Is the entire state lost or just part of it? Could you explain a bit what does your job look like and how do you check that the state is lost? Sorry if too obvious, but what are

Re: Error: Timeout of 60000ms expired before the position for partition

2021-10-04 Thread Dawid Wysakowicz
Hi, Do you mean that you fail to start Kafka? Or do you get the exception from Flink. Could you please share the full stack trace of the error? Best, Dawid On 02/10/2021 16:58, Dipanjan Mazumder wrote: > Hi, > >   I am getting below error while starting the flink as a standalone > single jvm

Re: Exception thrown during batch job execution on YARN even though job succeeded

2021-10-04 Thread Dawid Wysakowicz
Hey Ken, Regarding Rufus, I know he might be a bit eager in changing lines ;) If you want to ignore his changes in git blame, please take a look here[1]. For the main issue, do you mind creating a ticket? I hope someone will be able to pick it up. Best, Dawid [1]

Re: Flink application mode with no ui , how to start job using k8s ?

2021-10-04 Thread Dawid Wysakowicz
Hi Dhiru, For the question about auto scaling I'd recommend you this[1] blogpost from my colleague. I believe he explains it quite well how to do it. Besides that I am not sure what is your other question. Are you asking how to start the jobmanager without the UI? Can't you just simply not

[ANNOUNCE] Apache Flink 1.14.0 released

2021-09-29 Thread Dawid Wysakowicz
The Apache Flink community is very happy to announce the release of Apache Flink 1.14.0.   Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.   The release is available for download at:

Re: User defined function (UDF) not sent when submitting job to session cluster

2021-09-07 Thread Dawid Wysakowicz
Huh, of course. Actually I was too quick with my answer. Even if it is serialized with the JobGraph, the class is necessary on TMs to be deserialized. That's how java serialization works after all. So the actual answer, it is serialized with the JobGraph. The class is mandatory for

Re: User defined function (UDF) not sent when submitting job to session cluster

2021-09-07 Thread Dawid Wysakowicz
Hi Joel, There is a few uncertainties in my reply so I am including Timo who should be able to confirm or deny what I wrote. Generally speaking once a JobGraph is created it should contain the UDF. I might be wrong here, but the UDF must be available while the JobGraph is created. I believe

Re: Broadcast data to all keyed streams

2021-09-07 Thread Dawid Wysakowicz
Hi James, Can you elaborate why the "Broadcast State Pattern"[1] does not work for you? I'd definitely recommend that approach. I highly discourage this usage, but if you insist you could copy over the ConnectedStreams#transform method and remove the check that guards both sides of the operator

Re: De/Serialization API to tear-down user code

2021-09-02 Thread Dawid Wysakowicz
Hi Sergio, You can find the explanation why we haven't added the close method in the corresponding JIRA ticke[1]: When adding close() method to both DeserializationSchema and SerializationSchema with a default implementation, it breaks source compatibility if a user's class

Re: Watermark UI after checkpoint failure

2021-07-19 Thread Dawid Wysakowicz
Do you mean a failed checkpoint, or do you mean that it happens after a restore from a checkpoint? If it is the latter then this is kind of expected, as watermarks are not checkpointed and they need to be repopulated again. Best, Dawid On 19/07/2021 07:41, Dan Hill wrote: > After my dev flink

Re: Apache Flink - Reading Avro messages from Kafka with schema in schema registry

2021-07-15 Thread Dawid Wysakowicz
rom > the registry play ?  Does it have to do with the "normalization" > you've mentioned ? > > Thanks again for your time. > > Mans > > On Tuesday, July 13, 2021, 10:22:32 AM EDT, Dawid Wysakowicz > wrote: > > > Hi, > > Yes, you are right the

Re: My batch source doesn't emit MAX_WATERMARK when it finishes - why?

2021-07-08 Thread Dawid Wysakowicz
Hi, Your example does not show what watermarks are flowing through the program. It prints the watermark at the point a record is being emitted. As the cited text states, the final watermark is emitted after all records are emitted. You can test it e.g. with the newly added writeWatermark method

Re: Flink cep checkpoint size

2021-07-08 Thread Dawid Wysakowicz
Hi, Sorry for the late reply. Indeed I found a couple of problems with clearing the state for short lived keys. I created a JIRA[1] issue to track it and opened a PR (which needs test coverage before it can be merged) with fixes for those. Best, Dawid [1]

Re: Flink Metric Reporting from Job Manager

2021-07-08 Thread Dawid Wysakowicz
Hi, I think that is not directly supported. After all, the main method can also be executed outside of a JobManager and there you don't have any Flink context/connections/components set up. Best, Dawid On 08/07/2021 00:12, Mason Chen wrote: > Hi all, > > Does Flink support reporting metrics

Re: Question about POJO rules - why fields should be public or have public setter/getter?

2021-07-08 Thread Dawid Wysakowicz
Hi Naehee, Short answer would be for historic reasons and compatibility reasons. It was implemented that way back in the days and we don't want to change the default type extraction logic. Otherwise user jobs that rely on the default type extraction logic for state storing would end up with a

Re: Write Kafka message header using FlinkKafkaProducer

2021-06-21 Thread Dawid Wysakowicz
Hi, You can use KafkaSerializationSchema[1] which can create a ProducerRecord with Headers. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/api/java/org/apache/flink/streaming/connectors/kafka/KafkaSerializationSchema.html On 21/06/2021 12:58, Tan, Min wrote: > >

Re: unsubscribe

2021-06-21 Thread Dawid Wysakowicz
You should send a message to user-unsubscr...@flink.apache.org if you want to unsubscribe. On 20/06/2021 00:08, SANDEEP PUNIYA wrote: OpenPGP_signature Description: OpenPGP digital signature

Re: unsubscribe

2021-06-21 Thread Dawid Wysakowicz
You should send a message to user-unsubscr...@flink.apache.org if you want to unsubscribe. On 19/06/2021 18:04, 林俊良 wrote: > OpenPGP_signature Description: OpenPGP digital signature

Re:

2021-06-21 Thread Dawid Wysakowicz
You should send a message to user-unsubscr...@flink.apache.org if you want to unsubscribe. On 21/06/2021 04:25, 张万新 wrote: > unsubscribe OpenPGP_signature Description: OpenPGP digital signature

Re: DSL for Flink CEP

2021-06-03 Thread Dawid Wysakowicz
Hi, Just to add on top to what Fabian said. The only community supported CEP library is the one that comes with Flink[1]. It is also used internally to support the MATCH_RECOGNIZE clause in Flink SQL[2]. Therefore there is a both programmatic library native DSL for defining patterns. Moreover

Re: [ANNOUNCE] Apache Flink 1.13.1 released

2021-05-31 Thread Dawid Wysakowicz
m/_/flink > > Hope it'll be available soon. > > Thanks, > Youngwoo > > > On Sat, May 29, 2021 at 1:49 AM Dawid Wysakowicz > wrote: > >> The Apache Flink community is very happy to announce the release of Apache >> Flink 1.13.1, which is the first bugfix releas

Re: Running multiple CEP pattern rules

2021-05-31 Thread Dawid Wysakowicz
I am afraid there is no much of an active development going on in the CEP library. I would not expect new features there in the nearest future. On 28/05/2021 22:00, Tejas wrote: > Hi Dawid, > Do you have any plans to bring this functionality in flink CEP in future ? > > > > -- > Sent from: >

Re: Running multiple CEP pattern rules

2021-05-28 Thread Dawid Wysakowicz
Hi Tejas, It will not work that way. Bear in mind that every application of CEP.pattern creates a new operator in the graph. The exceptions you are seeing most probably result from calculating that huge graph and sending that over. You are reaching the timeout on submitting that huge graph. You

[ANNOUNCE] Apache Flink 1.13.1 released

2021-05-28 Thread Dawid Wysakowicz
/05/28/release-1.13.1.html|   |The full release notes are available in Jira:| |https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12350058|   |We would like to thank all contributors of the Apache Flink community who made this release possible!|   |Regards,| |Dawid Wysa

Re: [VOTE] Release 1.13.1, release candidate #1

2021-05-28 Thread Dawid Wysakowicz
> - built from source code > > - check apache-flink source/wheel package content > > - run python udf job > > > > Best, > > Xingbo > > > > Dawid Wysakowicz <mailto:dwysakow...@apache.org> <mailto:dwysakow...@a

Re: [VOTE] Release 1.13.1, release candidate #1

2021-05-27 Thread Dawid Wysakowicz
gt; - did a visual check of the release blog post > - started cluster and ran jobs (WindowJoin and WordCount); nothing > suspicious found in the logs > - verified change FLINK-22866 manually whether the issue is fixed > > Best, > Matthias > > On Tue, May 25, 2021 at 3:33 PM Dawid Wysakow

Re: Customer operator in BATCH execution mode

2021-05-26 Thread Dawid Wysakowicz
Hi, No there is no API in the operator to know which mode it works in. We aim to have separate operators for both modes if required. You can check e.g. how we do it in KeyedBroadcastStateTransformationTranslator[1]. Yes, it should be possible to register a timer for Long.MAX_WATERMARK if you

[VOTE] Release 1.13.1, release candidate #1

2021-05-25 Thread Dawid Wysakowicz
|Hi everyone,| |Please review and vote on the release candidate #1 for the version 1.13.1, as follows:| |[ ] +1, Approve the release| |[ ] -1, Do not approve the release (please provide specific comments)|     |The complete staging area is available for your review, which includes:| |* JIRA

Re: Is it possible to leverage the sort order in DataStream Batch Execution Mode?

2021-05-21 Thread Dawid Wysakowicz
I am afraid it is not possible to leverage the sorting for business logic. The sorting is applied on binary representation of the key as it is not necessary sorting per se, but rather grouping by the same keys. You can find more information in the FLIP of this feature e.g. here[1] Best, Dawid

Re: Possible way to avoid unnecessary serialization calls.

2021-05-12 Thread Dawid Wysakowicz
o 1. > Is it a bug or something else ? > > Best regards, > Alexander > > > пн, 10 мая 2021 г. в 15:08, Dawid Wysakowicz <mailto:dwysakow...@apache.org>>: > > Hi Alex, > > If you are sure that the operations in between do not change the > p

Re: Flink: Clarification required

2021-05-10 Thread Dawid Wysakowicz
Hi Jessy, I have a stateless Flink application where the source and sink are two different Kafka topics. Is there any benefit in adding checkpointing for this application?. will it help in some way for the rewind and replays while restarting from the failure? If you do want to

Re: Possible way to avoid unnecessary serialization calls.

2021-05-10 Thread Dawid Wysakowicz
Hi Alex, If you are sure that the operations in between do not change the partitioning of the data and keep the key constant for the whole pipeline you could use the reinterpretAsKeyedStream[1]. I guess this answers your questions 1 & 2. As for the third question, first of all you should look

Re: About the windowOperator and Watermark

2021-05-10 Thread Dawid Wysakowicz
Hi, When a Watermark arrives the window operator will emit all windows that are considered finished at the time of the Watermark. In your example (assuming both windows are finished) they will both be emitted. Best, Dawid On 08/05/2021 08:03, 曲洋 wrote: > Hi Experts, > > Given that a window in

Re: What does enableObjectReuse exactly do?

2021-05-10 Thread Dawid Wysakowicz
Hi, In the streaming API, the biggest difference is that if you do not disable object reuse, records will be duplicated/copied when forwarding from an operator to the downstream one. If you are sure you work with immutable objects, I'd highly recommend enabling object reuse. Best, Dawid On

Re: Read kafka offsets from checkpoint - state processor

2021-05-10 Thread Dawid Wysakowicz
Hi, You would need to look into the internals of FlinkKafkaConsumerBase. In the current master the state for offsets is initialized in here:

Re: Unsubscribe

2021-05-10 Thread Dawid Wysakowicz
Hi all, Before reaching out to the INFRA team. May I ask all of you to make sure that you follow the two-step process? After sending the initial mail to the user-unsubscr...@flink.apache.org you should receive a request for a confirmation. If you did

Re: callback by using process function

2021-05-10 Thread Dawid Wysakowicz
Hi, I am sorry, but I think I don't fully get your question. Could you try to rephrase it? Maybe an example could help. Generally speaking the KeyedProcessFunction is scoped to a single key. Whenever you access a state (MapState, ValueState, ... ) it keeps the current value of that state for the

Re: some questions about data skew

2021-05-10 Thread Dawid Wysakowicz
Hi, What you could do to improve processing of a skewed data is to introduce an artificial preaggregation. You could add some artificial uniformly distributed secondary key and calculate your aggregates on (original key, secondary uniform key) and then do the final aggregation in an additional

Re: Guaranteeing event order in a KeyedProcessFunction

2021-05-10 Thread Dawid Wysakowicz
Hey Miguel, I think you could take a look at the CepOperator which does pretty much what you are describing. As for more direct answers for your questions. If you use KeyedProcessFunction it is always scoped to a single Key. There is no way to process events from other keys. If you want to have

[ANNOUNCE] Apache Flink 1.13.0 released

2021-05-03 Thread Dawid Wysakowicz
|The Apache Flink community is very happy to announce the release of Apache Flink 1.13.0.|   |Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications.|   |The release is available for download at:|

Re: Contiguity in SQL vs CEP

2021-04-26 Thread Dawid Wysakowicz
Hi, MATCH_RECOGNIZE clause in SQL standard does not support different contiguities. The MATCH_RECOGNIZE always uses the strict contiguity. Best, Dawid On 21/04/2021 00:02, tbud wrote: > There's 3 different types of Contiguity defined in the CEP documentation [1] > looping + non-looping --

Re: Contiguity and state storage in CEP library

2021-04-26 Thread Dawid Wysakowicz
Hi, Yes you are correct that if an event can not match any pattern it won't be stored in state. If you process your records in event time it might be stored for a little while before processing in order to sort the incoming records based on time. Once a Watermark with a higher timestamp comes it

Re: Python Integration with Ververica Platform

2021-04-13 Thread Dawid Wysakowicz
I'd recommend reaching out directly to Ververica. Ververica platform is not part of the open-source Apache Flink project. I can connect you with Konstantin who I am sure will be happy to answer your question ;) Best, Dawid On 12/04/2021 15:40, Robert Cullen wrote: > I've been using the

Re: NPE when aggregate window.

2021-04-13 Thread Dawid Wysakowicz
Hi, Could you check that your grouping key has a stable hashcode and equals? It is very likely caused by an unstable hashcode and that a record with an incorrect key ends up on a wrong task manager. Best, Dawid On 13/04/2021 08:47, Si-li Liu wrote: > Hi,  > > I encounter a weird NPE when try

[ANNOUNCE] Release 1.13.0, release candidate #0

2021-04-01 Thread Dawid Wysakowicz
[3],| |* source code tag "release-1.2.3-rc3" [4],|   |Your help testing the release will be greatly appreciated! |   |Thanks,| |Dawid Wysakowicz |   |[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.13.0-rc0/ | |[2] https://dist.apache.org/repos/dist/release/flink/K

Re: [DISCUSS] Feature freeze date for 1.13

2021-04-01 Thread Dawid Wysakowicz
ing the "Rebase and merge" button. I find that > merge option > > useful, > > > > especially for small simple changes and for > backports. The following > > > should > &g

Re: PyFlink Table API: Interpret datetime field from Kafka as event time

2021-03-30 Thread Dawid Wysakowicz
Hey, I am not sure which format you use, but if you work with JSON maybe this option[1] could help you. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/formats/json.html#json-timestamp-format-standard On 30/03/2021 06:45, Sumeet Malhotra

Re: pipeline.auto-watermark-interval vs setAutoWatermarkInterval

2021-03-23 Thread Dawid Wysakowicz
Hey, I would like to double check this with Jark and/or Timo. As far as DataStream is concerned the javadoc is correct. Moreover the pipeline.auto-watermak-interval and setAutoWatermarkInterval are effectively the same setting/option. However I am not sure if Table API interprets it in the same

[DISCUSS] Feature freeze date for 1.13

2021-03-23 Thread Dawid Wysakowicz
Hi devs, users! 1. *Feature freeze date* We are approaching the end of March which we agreed would be the time for a Feature Freeze. From the knowledge I've gather so far it still seems to be a viable plan. I think it is a good time to agree on a particular date, when it should happen. We

Re: Eliminating Shuffling Under FlinkSQL

2021-03-19 Thread Dawid Wysakowicz
Your understanding of a group by is correct. It is equivalent to a key by. I agree it would be a great feature to keep the Source's partitioning but unfortunately as of now it is not yet supported. Best, Dawid On 18/03/2021 18:28, Aeden Jameson wrote: > It's my understanding that a group by is

Re: Understanding Max Parallelism

2021-03-19 Thread Dawid Wysakowicz
Hi Aeden, The maxParallelism option defines the number of key groups that will be created within the keyed state and thus define the maximum parallelism that a Flink keyed job can scale up to as each key group must be atomically assigned to a single task. You can read more on how the rescaling

Re: Parameter to config read frequency in Kafka SQL connector

2021-03-19 Thread Dawid Wysakowicz
Hi, Unfortunately I have no experience with this. You can pass any Kafka client parameters through the properties.* option[1] and see if the setting works for you. Best, Dawid [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/connectors/kafka.html#properties On

Re: Flink minimum resource recommendation on k8s cluster

2021-03-19 Thread Dawid Wysakowicz
I'd say no. It depends on your job. You can refer to a very good presentation from Robert on how to calculate resource requirements[1]. [1] https://www.youtube.com/watch?v=8l8dCKMMWkw On 18/03/2021 11:37, Amit Bhatia wrote: > Hi, > > Is there any minimum resource ( CPU & Memory) recommendation

Re: The Role of TimerService in ProcessFunction

2021-03-19 Thread Dawid Wysakowicz
Hi Chirag, I agree it might be a little bit confusing. Let me try to explain the reasoning. To do that I'll first try to rephrase the reasoning from FLINK-8560 for introducing the KeyedProcessFunction. It was introduced so that users have a typed access to the current key via Context and

Re: Production Readiness of File Source

2021-03-18 Thread Dawid Wysakowicz
Hi, As for the issue of production readiness of the File Source(and other components) I'd recommend having a look at the PR, which is close to being merged where we express our opinion how we see certain components: https://github.com/apache/flink-web/pull/426 I am also cc'ing Stephan who wrote

Re: ClassCastException after upgrading Flink application to 1.11.2

2021-03-18 Thread Dawid Wysakowicz
Could you share a full stacktrace with us? Could you check the stack trace also in the task managers logs? As a side note, make sure you are using the same version of all Flink dependencies. Best, Dawid On 17/03/2021 06:26, soumoks wrote: > Hi, > > We have upgraded an application originally

Re: How to advance Kafka offsets manually with checkpointing enabled on Flink (TableAPI)

2021-03-18 Thread Dawid Wysakowicz
Hi Rex, The approach you described is definitely possible in the DataStream API. You could replace the uid of your Kafka source and start your job with your checkpoint with the allowNonRestoredState option enabled[1]. I am afraid though it is not possible to change the uid in Table API/SQL

Re: custom metrics within a Trigger

2021-03-18 Thread Dawid Wysakowicz
Do you mind sharing the code how do you register your metrics with the TriggerContext? It could help us identify where does name collisions come from. As far as I am aware it should be fine to use the TriggerContext for registering metrics. Best, Dawid On 16/03/2021 17:35, Aleksander Sumowski

Re: DataStream in batch mode - handling (un)ordered bounded data

2021-03-12 Thread Dawid Wysakowicz
Hi Alexis, As of now there is no such feature in the DataStream API. The Batch mode in DataStream API is a new feature and we would be interested to hear about the use cases people want to use it for to identify potential areas to improve. What you are suggesting generally make sense so I think

Re: Gradually increasing checkpoint size

2021-03-11 Thread Dawid Wysakowicz
Hey Dan, I think the logic should be correct. Mind that in the processElement we are using *relative*Upper/LowerBound, which are inverted global bound: relativeUpperBound = upperBound for left and -lowerBound for right relativeLowerBound = lowerBound for left and -upperBound for right

Re: Flink + Hive + Compaction + Parquet?

2021-03-04 Thread Dawid Wysakowicz
Hi, I know Jingsong worked on Flink/Hive filesystem integration in the Table/SQL API. Maybe he can shed some light on your questions. Best, Dawid On 02/03/2021 21:03, Theo Diefenthal wrote: > Hi there, > > Currently, I have a Flink 1.11 job which writes parquet files via the >

Re: State Schema Evolution within SQL API

2021-03-04 Thread Dawid Wysakowicz
Hi Jan, As of now Flink does not give any guarantees for Table/SQL API savepoint compatibility if you change the query or Flink version. Flink Table/SQL API uses an optimizer that can apply different optimizations or operations reordering based on the queried fields or computations that can

Re: timeWindow()s and queryable state

2021-03-04 Thread Dawid Wysakowicz
Hey Ron, I am pretty sure the queryable state will not do any pruning. It will keep the state for all windows seen so far. The allowedLateness applies to the window computation not the queryable state part. The `asQueryableState` will create a downstream operator that will keep updating a state

Re: Producer Configuration

2021-03-04 Thread Dawid Wysakowicz
Hey Claude. Alexey is right about the page. The page from your screenshot shows only the entries passed via StreamExecutionEnvironment#getConfig#setGlobalJobParameters. Configuration for individual connectors or other operators is not displayed there. If you need help debugging your time out

Re: Flink CEP: can't process PatternStream (v 1.12, EventTime mode)

2021-02-26 Thread Dawid Wysakowicz
Hi, What is exactly the problem? Is it that no patterns are being generated? Usually the problem is in idle parallel instances[1]. You need to have data flowing in each of the parallel instances for a watermark to progress. You can also read about it in the aspect of Kafka's partitions[2].

Re: Flink CEP: can't process PatternStream

2021-02-26 Thread Dawid Wysakowicz
Hi Yuri, Which Flink version are you using? Is it 1.12? In 1.12 we changed the default TimeCharacteristic to EventTime. Therefore you need watermarks and timestamp[1] for your program to work correctly. If you want to apply your pattern in ProcessingTime you can do: PatternStream patternStream =

[UPDATE] Release 1.13 feature freeze

2021-02-24 Thread Dawid Wysakowicz
Hi all, The agreed date of a feature freeze is due in about a month. Therefore we thought it would be a good time to give an update of the current progress. From the information we gathered there are currently no known obstacles or foreseeable delays. We are still aiming for the end of March as

Re: 回复: DataStream problem

2021-02-17 Thread Dawid Wysakowicz
t; > --?0?2?0?2------ > *??:* "Dawid Wysakowicz" ; > *:*?0?22021??2??15??(??) 6:59 > *??:*?0?2"?g???U?[";"user"; > *:*?0?2Re: DataStream problem > > Hi Jiazhi, > > Could you e

Re: Failed to register Protobuf Kryo serialization

2021-02-15 Thread Dawid Wysakowicz
> Svend > > > > On Mon, Feb 15, 2021 at 10:03 AM Dawid Wysakowicz > mailto:dwysakow...@apache.org>> wrote: > > Hey, > > Why do you say the way you did it, does not work? The logs you > posted say the classes cannot be handled by Flink's built-

Re: DataStream problem

2021-02-15 Thread Dawid Wysakowicz
Hi Jiazhi, Could you elaborate what exactly do you want to achieve? What have you tried so far? Best, Dawid On 15/02/2021 11:11, ?g???U?[ wrote: > Hi all > ?0?2 ?0?2 ?0?2 ?0?2Using DataStream, How to implement a message and the same > message appears again 10 minutes later? > Thanks, >

Re: Performance issues when RocksDB block cache is full

2021-02-15 Thread Dawid Wysakowicz
Hey Yaroslav, Unfortunately I don't have enough knowledge to give you an educated reply. The first part certainly does make sense to me, but I am not sure how to mitigate the issue. I am ccing Yun Tang who worked more on the RocksDB state backend (It might take him a while to answer though, as he

Re: Failed to register Protobuf Kryo serialization

2021-02-15 Thread Dawid Wysakowicz
Hey, Why do you say the way you did it, does not work? The logs you posted say the classes cannot be handled by Flink's built-in mechanism for serializing POJOs and it falls back to a GenericType which is serialized with Kryo and should go through your registered serializer. Best, Dawid On

Re: Generated SinkConversion code ignores incoming StreamRecord timestamp

2021-02-14 Thread Dawid Wysakowicz
Hi Dawid, > Yes, looks like it. Thanks! > > Is there an ETA on 1.12.2 yet? > > On Mon, Feb 15, 2021 at 9:48 AM Dawid Wysakowicz > mailto:dwysakow...@apache.org>> wrote: > > Hey Yuval, > > Could it be that you are hitting this bug[1], which has been fixed >

Re: Generated SinkConversion code ignores incoming StreamRecord timestamp

2021-02-14 Thread Dawid Wysakowicz
Hey Yuval, Could it be that you are hitting this bug[1], which has been fixed recently? Best, Dawid [1] https://issues.apache.org/jira/browse/FLINK-21013 On 15/02/2021 08:20, Yuval Itzchakov wrote: > Hi, > > I have a source that generates events with timestamps. These flow > nicely, until

  1   2   3   4   5   >