[ANNOUNCE] Apache Flink Kafka Connectors 3.0.2 released

2023-12-01 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Kafka Connectors 3.0.2. This release is compatible with the Apache Flink 1.17.x and 1.18.x release series. Apache Flink® is an open-source stream processing framework for distributed, high-performing,

Re: dependency error with latest Kafka connector

2023-11-25 Thread Tzu-Li (Gordon) Tai
new version I can add the dependency > "org.apache.flink" % "flink-connector-kafka" % "3.0.2-1.18", > > > and compile it without any errors. > > Günter > > > On 25.11.23 17:40, Tzu-Li (Gordon) Tai wrote: > > Hi Günter, > > &g

Re: dependency error with latest Kafka connector

2023-11-25 Thread Tzu-Li (Gordon) Tai
:flink-connector-kafka:3.0.2-18 > [error] Not found > [error] Not found > [error] not found: > > /home/swissbib/.ivy2/local/org.apache.flink/flink-connector-kafka/3.0.2-18/ivys/ivy.xml > [error] not found: > > https://repo1.maven.org/maven2/org/apache/flink/flink-conne

Re: dependency error with latest Kafka connector

2023-11-24 Thread Tzu-Li (Gordon) Tai
Hi all, I've cherry-picked FLINK-30400 onto v3.0 branch of flink-connector-kafka. Treating this thread as justification to start a vote for 3.0.2 RC #1 immediately so we can get out a new release ASAP. Please see the vote thread here [1]. @guenterh.lists Would you be able to test this RC and

Re: dependency error with latest Kafka connector

2023-11-23 Thread Tzu-Li (Gordon) Tai
-impl:jar:2.17.1:runtime [INFO] +- org.apache.logging.log4j:log4j-api:jar:2.17.1:runtime [INFO] \- org.apache.logging.log4j:log4j-core:jar:2.17.1:runtime ``` On Thu, Nov 23, 2023 at 11:48 AM Tzu-Li (Gordon) Tai wrote: > Hi all, > > There seems to be an issue with the connector release scr

Re: dependency error with latest Kafka connector

2023-11-23 Thread Tzu-Li (Gordon) Tai
Hi all, There seems to be an issue with the connector release scripts used in the release process that doesn't correctly overwrite the flink.version property in POMs. I'll kick off a new release for 3.0.2 shortly to address this. Sorry for overlooking this during the previous release. Best,

[ANNOUNCE] Apache Flink Kafka Connectors 3.0.1 released

2023-10-31 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Kafka Connectors 3.0.1. This release is compatible with the Apache Flink 1.17.x and 1.18.x release series. Apache Flink® is an open-source stream processing framework for distributed, high-performing,

Re: Which Flink engine versions do Connectors support?

2023-10-27 Thread Tzu-Li (Gordon) Tai
Hi Xianxun, You can find the list supported Flink versions for each connector here: https://flink.apache.org/downloads/#apache-flink-connectors Specifically for the Kafka connector, we're in the process of releasing a new version for the connector that works with Flink 1.18. The release

Re: Kafka Sink and Kafka transaction timeout

2023-10-02 Thread Tzu-Li (Gordon) Tai
Hi Lorenzo, The main failure scenario that recommendation is addressing is when the Flink job fails right after a checkpoint successfully completes, but before the KafkaSink subtasks receive from the JM the checkpoint completed RPC notification to commit the transactions. It is possible that

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
solve the copying problem by requiring the two projects to be > siblings in the file system and by pre-copying the local build artifacts > into a location accessible by the existing Docker contexts. This would > still leave us with the need to have two PRs and releases instead of one, > t

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
default vs. local mode and what versions of Flink and Statefun > should be referenced, and then you can build a run the local examples > without any additional steps. Does that sound like a reasonable approach? > > > On Fri, Aug 18, 2023 at 2:17 PM Tzu-Li (Gordon) Tai > wrote: > >

Re: [DISCUSS] Status of Statefun Project

2023-08-18 Thread Tzu-Li (Gordon) Tai
elp with Statefun releases. >>>> >>>> Best regards, >>>> >>>> Martijn >>>> >>>> On Tue, Jun 20, 2023 at 2:21 PM Galen Warren >>>> wrote: >>>> >>>>> Thanks. >>>>> >>>>> Marti

Re: Flink Kafka source getting marked as Idle

2023-06-17 Thread Tzu-Li (Gordon) Tai
Hi Anirban, > But this happened only once and now it is not getting reproduced at all. This does make it sound a lot like https://issues.apache.org/jira/browse/FLINK-31632. > 1. What is the default watermarking strategy used in Flink. Can I quickly check the default parameters being used by

Re: [DISCUSS] Status of Statefun Project

2023-06-17 Thread Tzu-Li (Gordon) Tai
> Perhaps he could weigh in on whether the combination of automated tests plus those smoke tests should be sufficient for testing with new Flink versions What we usually did at the bare minimum for new StateFun releases was the following: 1. Build tests (including the smoke tests in the e2e

Re: KafkaSource consumer group

2023-03-30 Thread Tzu-Li (Gordon) Tai
Sorry, I meant to say "Hi Ben" :-) On Thu, Mar 30, 2023 at 9:52 AM Tzu-Li (Gordon) Tai wrote: > Hi Robert, > > This is a design choice. Flink's KafkaSource doesn't rely on consumer > groups for assigning partitions / rebalancing / offset tracking. It > manually as

Re: KafkaSource consumer group

2023-03-30 Thread Tzu-Li (Gordon) Tai
Hi Robert, This is a design choice. Flink's KafkaSource doesn't rely on consumer groups for assigning partitions / rebalancing / offset tracking. It manually assigns whatever partitions are in the specified topic across its consumer instances, and rebalances only when the Flink job / KafkaSink is

Re: State Processor API - VoidNamespaceSerializer must be compatible with the old namespace serializer LongSerializer

2022-10-26 Thread Tzu-Li (Gordon) Tai
Hi Filip, I think what you are seeing is expected. The State Processor API was intended to allow access only to commonly used user-facing state structures, while Stateful Functions uses quite a bit of Flink internal features, including for its state maintenance. The list state in question in

Re: Ignoring state's offset when restoring checkpoints

2022-07-08 Thread Tzu-Li (Gordon) Tai
Hi Robin, Apart from what Alexander suggested, I think you could also try the following first: Let the job use a "new" Kafka source, which you can achieve by simply assigning a different operator ID than before. If you previously did not set an ID, then previously by default it would have been a

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.1.0 released

2021-08-31 Thread Tzu-Li (Gordon) Tai
Congrats on the release! And thank you for driving this release, Igal. Cheers Gordon On Tue, Aug 31, 2021, 23:13 Igal Shilman wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions (StateFun) 3.1.0. > > StateFun is a cross-platform stack

Re: [ANNOUNCE] Apache Flink Stateful Functions 3.1.0 released

2021-08-31 Thread Tzu-Li (Gordon) Tai
Congrats on the release! And thank you for driving this release, Igal. Cheers Gordon On Tue, Aug 31, 2021, 23:13 Igal Shilman wrote: > The Apache Flink community is very happy to announce the release of Apache > Flink Stateful Functions (StateFun) 3.1.0. > > StateFun is a cross-platform stack

Re: Configure Kafka ingress through property files in Stateful function 3.0.0

2021-05-28 Thread Tzu-Li (Gordon) Tai
Hi Jessy, I assume "consumer.properties" is a file you have included in your StateFun application's image? The ingress.spec.properties field in the module YAML specification file expects a list of key value pairs, not a properties file. See for example [1]. I think it could make sense to

Re: Statefun 2.2.2 Checkpoint restore NPE

2021-05-28 Thread Tzu-Li (Gordon) Tai
Hi Timothy, It would indeed be hard to figure this out without any stack traces. Have you tried changing to debug level logs? Maybe you can also try using the StateFun Harness to restore and run your job in the IDE - in that case you should be able to see which code exactly is throwing this

Re: Stateful Functions, Kinesis, and ConsumerConfigConstants

2021-04-29 Thread Tzu-Li (Gordon) Tai
Hi Ammon, Unfortunately you're right. I think the Flink Kinesis Consumer specific configs, e.g. keys in the ConsumerConfigConstants class, were overlooked in the initial design. One way to workaround this is to use the `SourceFunctionSpec` [1]. Using that spec, you can use any Flink

[ANNOUNCE] Apache Flink Stateful Functions 3.0.0 released

2021-04-15 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions (StateFun) 3.0.0. StateFun is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed

[ANNOUNCE] Apache Flink Stateful Functions 3.0.0 released

2021-04-15 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions (StateFun) 3.0.0. StateFun is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed

Re: How to know if task-local recovery kicked in for some nodes?

2021-04-06 Thread Tzu-Li (Gordon) Tai
Hi Sonam, Pulling in Till (cc'ed), I believe he would likely be able to help you here. Cheers, Gordon On Fri, Apr 2, 2021 at 8:18 AM Sonam Mandal wrote: > Hello, > > We are experimenting with task local recovery and I wanted to know whether > there is a way to validate that some tasks of the

Re: Why is Hive dependency flink-sql-connector-hive not available on Maven Central?

2021-04-06 Thread Tzu-Li (Gordon) Tai
Hi, I'm pulling in Rui Li (cc'ed) who might be able to help you here as he actively maintains the hive connectors. Cheers, Gordon On Fri, Apr 2, 2021 at 11:36 AM Yik San Chan wrote: > The question is cross-posted in StackOverflow >

Re: How to specific key serializer

2021-03-31 Thread Tzu-Li (Gordon) Tai
Hi CZ, The issue here is that the Scala DataStream API uses Scala macros to decide the serializer to be used. Since that recognizes Scala case classes, the CaseClassSerializer will be used. However, in the State Processor API, those Scala macros do not come into play, and therefore it directly

Re: Support for sending generic class

2021-03-30 Thread Tzu-Li (Gordon) Tai
Hi Le, Thanks for reaching out with this question! It's actually a good segue to allow me to introduce you to StateFun 3.0.0 :) StateFun 3.0+ comes with a new type system that would eliminate this hassle. You can take a sneak peek here [1]. This is part 1 of a series of tutorials on fundamentals

Re: StateFun examples in scala

2021-03-30 Thread Tzu-Li (Gordon) Tai
Hi Jose! For Scala, we would suggest to wait until StateFun 3.0.0 is released, which is actually happening very soon (likely within 1-2 weeks) as there is an ongoing release candidate vote [1]. The reason for this is that version 3.0 adds a remote SDK for Java, which you should be able to use

Re: Relation between Two Phase Commit and Kafka's transaction aware producer

2021-03-15 Thread Tzu-Li (Gordon) Tai
checkpoint status and conservatively rollback to previous > checkpoint and replay all data > > On Thu, Mar 11, 2021 at 7:44 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi Kevin, >> >> Perhaps the easiest way to answer your question, is to go through how the >> exactly

Re: EOFException on attempt to scale up job with RocksDB state backend

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, Could you provide info on the Flink version used? Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: uniqueness of name when constructing a StateDescriptor

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, The scope is per individual operator, i.e. a single KeyedProcessFunction instance cannot have multiple registered state with the same name. Cheers, Gordon -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: [Statefun] Interaction Protocol for Statefun

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi, Interesting idea! Just some initial thoughts and questions, maybe others can chime in as well. In general I think the idea of supporting more high-level protocols on top of the existing StateFun messaging primitives is good. For example, what probably could be categorized under this effort

Re: Extracting state keys for a very large RocksDB savepoint

2021-03-15 Thread Tzu-Li (Gordon) Tai
Hi Andrey, Perhaps the functionality you described is worth adding to the State Processor API. Your observation on how the library currently works is correct; basically it tries to restore the state backends as is. In you current implementation, do you see it worthwhile to try to add this?

Re: Relation between Two Phase Commit and Kafka's transaction aware producer

2021-03-10 Thread Tzu-Li (Gordon) Tai
Hi Kevin, Perhaps the easiest way to answer your question, is to go through how the exactly-once FlinkKafkaProducer using a 2PC implementation on top of Flink's checkpointing mechanism. The phases can be broken down as follows (simplified assuming max 1 concurrent checkpoint and that checkpoint

Re: Best practices for complex state manipulation

2021-03-10 Thread Tzu-Li (Gordon) Tai
Hi Dan, For a deeper dive into state backends and how they manage state, or performance critical aspects such as state serialization and choosing appropriate state structures, I highly recommend starting from this webinar done by my colleague Seth Weismann:

Re: Job downgrade

2021-03-07 Thread Tzu-Li (Gordon) Tai
t; Alexey > > ------ > *From:* Tzu-Li (Gordon) Tai > *Sent:* Thursday, March 4, 2021 12:58:01 AM > *To:* Alexey Trenikhun > *Cc:* Piotr Nowojski ; Flink User Mail List < > user@flink.apache.org> > *Subject:* Re: Job downgrade > > Hi Alexey, > > Are you using the heap bac

Re: Job downgrade

2021-03-04 Thread Tzu-Li (Gordon) Tai
Hi Alexey, Are you using the heap backend? If that's the case, then for whatever state was registered at the time of a savepoint, Flink will attempt to restore it to the heap backends. This essentially means that state "B" will be read as well, that would explain why Flink is trying to locate

Re: Flink Statefun TTL

2021-02-24 Thread Tzu-Li (Gordon) Tai
b.com/apache/flink-statefun/tree/master/statefun-sdk-java > Thanks, > > Tim > > On Wed, Feb 24, 2021 at 11:49 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi Timothy, >> >> Starting from StateFun 2.2.x, in the module.yaml file, you can set for >> each indivi

Re: Flink Statefun TTL

2021-02-24 Thread Tzu-Li (Gordon) Tai
Hi Timothy, Starting from StateFun 2.2.x, in the module.yaml file, you can set for each individual state of a function an "expireMode" field, which values can be either "after-invoke" or "after-write". For example: ``` - function: meta: ... spec: states: - name:

Re: Run the code in the UI

2021-02-21 Thread Tzu-Li (Gordon) Tai
Hi, Could you re-elaborate what exactly you mean? If you wish to run a Flink job within the IDE, but also have the web UI running for it, you can use `StreamExecutionEnvironment.createLocalEnvironmentWithWebUI(Configuration)` to create the execution environment. The default port 8081 will be

Re: [Statefun] Dynamic behavior

2021-02-21 Thread Tzu-Li (Gordon) Tai
Hi, FWIW, there is this JIRA that is tracking a pubsub / broadcast messaging primitive in StateFun: https://issues.apache.org/jira/browse/FLINK-16319 This is probably what you are looking for. And I do agree, in the case that the control stream (which updates the application logic) is high

Re: Flink 1.11.3 not able to restore with savepoint taken on Flink 1.9.3

2021-02-19 Thread Tzu-Li (Gordon) Tai
Hi, I'm not aware of any breaking changes in the savepoint formats from 1.9.3 to 1.11.3. Let's first try to rule out any obvious causes of this: - Were any data types / classes that were used in state changed across the restores? Remember that keys types are also written as part of state

Re: Statefun: cancel "sendAfter"

2021-02-05 Thread Tzu-Li (Gordon) Tai
there is a delay we need to know. But if the > customer confirms in time we want to cleanup to keep the state small. > > > > I dug a little bit into the code. May I create an issue to discuss my > ideas? > > > > Cheers, > > Stephan > > > > > > *Von:* Tzu

Re: Statefun: cancel "sendAfter"

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, You are right, currently StateFun does not support deleting a scheduled delayed message. StateFun supports delayed messages by building on top of two Flink constructs: 1) registering processing time timers, and 2) buffering the message payload to be sent in state. The delayed messages are

Re: Question on Flink and Rest API

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi, There is no out-of-box Flink source/sink connector for this, but it isn't unheard of that users have implemented something to support what you outlined. One way to possibly achieve this is: in terms of a Flink streaming job graph, what you would need to do is co-locate the source (which

Re: Question a possible use can for Iterative Streams.

2021-02-02 Thread Tzu-Li (Gordon) Tai
Hi Marco, In the ideal setup, enrichment data existing in external databases is bootstrapped into the streaming job via Flink's State Processor API, and any follow-up changes to the enrichment data is streamed into the job as a second union input on the enrichment operator. For this solution to

Re: [Stateful Functions] Problems with Protobuf Versions

2021-02-01 Thread Tzu-Li (Gordon) Tai
Hi, This hints an incompatible Protobuf generated class by the protoc compiler, and the runtime dependency used by the code. Could you try to make sure the `protoc` compiler version matches the Protobuf version in your code? Cheers, Gordon On Fri, Jan 29, 2021 at 6:07 AM Jan Brusch wrote: >

Re: Stateful Functions - accessing the state aside of normal processing

2021-01-27 Thread Tzu-Li (Gordon) Tai
Hi Stephan, Great to hear about your experience with StateFun so far! I think what you are looking for is a way to read StateFun checkpoints, which are basically an immutable consistent point-in-time snapshot of all the states across all your functions, and run some computation or simply to

[ANNOUNCE] Stateful Functions Docker images are now hosted on Dockerhub at apache/flink-statefun

2021-01-18 Thread Tzu-Li (Gordon) Tai
Hi, We have created an "apache/flink-statefun" Dockerhub repository managed by the Flink PMC, at: https://hub.docker.com/r/apache/flink-statefun The images for the latest stable StateFun release, 2.2.2, have already been pushed there. Going forward, it will be part of the release process to make

Re: Statefun with RabbitMQ consumes message but does not run statefun

2021-01-12 Thread Tzu-Li (Gordon) Tai
Hi, There is no lock-step of releasing a new StateFun release when a new Flink release goes out. StateFun and Flink have individual releasing schemes and schedules. Usually, for new major StateFun version releases, we will upgrade its Flink dependency to the latest available version. We are

[ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-01 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2. *We strongly recommend all users to upgrade to this version.* *Please check out the release announcement:*

[ANNOUNCE] Apache Flink Stateful Functions 2.2.2 released

2021-01-01 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2. *We strongly recommend all users to upgrade to this version.* *Please check out the release announcement:*

Re: Problems with FlinkKafkaProducer - closing after timeoutMillis = 9223372036854775807 ms.

2020-11-19 Thread Tzu-Li (Gordon) Tai
Hi, One thing to clarify first: I think the "Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms" log doesn't necessarily mean that a producer was closed due to timeout (Long.MAX_VALUE). I guess that is just a Kafka log message that is logged when a Kafka producer is closed

Re: Flink State Processor API - Bootstrap One state

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, Using the State Processor API, modifying the state in an existing savepoint results in a new savepoint (new directory) with the new modified state. The original savepoint remains intact. The API allows you to only touch certain operators, without having to touch any other state and have them

Re: split avro kafka field

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, 1. You'd have to configure your Kafka connector source to use a DeserializationSchema that deserializes the Kafka record byte to your generated Avro type. You can use the shipped `AvroDeserializationSchema` for that. 2. After your Kafka connector source, you can use a flatMap transformation

Re: Kafka SQL table Re-partition via Flink SQL

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, I'm pulling in some Flink SQL experts (in CC) to help you with this one :) Cheers, Gordon On Tue, Nov 17, 2020 at 7:30 AM Slim Bouguerra wrote: > Hi, > I am trying to author a SQL job that does repartitioning a Kafka SQL table > into another Kafka SQL table. > as example input/output

Re: Flink 1.10 -> Savepoints referencing to checkpoints or not

2020-11-16 Thread Tzu-Li (Gordon) Tai
Hi, Both the data and metadata is being stored in the savepoint directory, since Flink 1.3. The metadata in the savepoint directory does not reference and checkpoint data files. In 1.11, what was changed was that the savepoint metadata uses relative paths to point to the data files in the

[ANNOUNCE] Apache Flink Stateful Functions 2.2.1 released

2020-11-11 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1. This release fixes a critical bug that causes restoring a Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. *We

[ANNOUNCE] Apache Flink Stateful Functions 2.2.1 released

2020-11-11 Thread Tzu-Li (Gordon) Tai
The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1. This release fixes a critical bug that causes restoring a Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. *We

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
On Wed, Nov 11, 2020 at 1:44 PM Tzu-Li (Gordon) Tai wrote: > Hi Lian, > > Sorry, I didn't realize that the issue you were bumping into was caused by > the module not being discovered. > You're right, the harness utility would not help here. > Actually, scratch this comment. T

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
ttps://github.com/apache/flink-statefun/blob/master/pom.xml#L85,L89 >> [4] >> https://github.com/apache/flink-statefun/blob/master/statefun-examples/statefun-python-greeter-example/Dockerfile#L20 >> >> On Tue, Nov 10, 2020 at 4:47 PM Tzu-Li (Gordon) Tai >> wrote: >

Re: debug statefun

2020-11-10 Thread Tzu-Li (Gordon) Tai
Hi, StateFun provide's a Harness utility exactly for that, allowing you to test a StateFun application in the IDE / setting breakpoints etc. You can take a look at this example on how to use the harness:

Re: cannot pull statefun docker image

2020-11-06 Thread Tzu-Li (Gordon) Tai
Hi, The Dockerfiles in the examples in the flink-statefun repo currently work against images built from snapshot development branches. Ververica has been hosting StateFun base images for released versions: https://hub.docker.com/r/ververica/flink-statefun You can change `FROM flink-statefun:*`

Re: Remote Stateful Function Scalability

2020-10-17 Thread Tzu-Li (Gordon) Tai
Hi Elias, On Sun, Oct 18, 2020 at 6:16 AM Elias Levy wrote: > After reading the Stateful Functions documentation, I am left wondering > how remote stateful functions scale. > > The documentation mentions that the use of remote functions allows the > state and compute tiers to scale

Re: Stateful function and large state applications

2020-10-13 Thread Tzu-Li (Gordon) Tai
Hi, The StateFun runtime is built directly on top of Apache Flink, so RocksDB as the state backend is supported as well as all the features for large state such as checkpointing and local task recovery. Cheers, Gordon On Wed, Oct 14, 2020 at 11:49 AM Lian Jiang wrote: > Hi, > > I am learning

Re: Native State in Python Stateful Functions?

2020-10-09 Thread Tzu-Li (Gordon) Tai
ining-functions [2] https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.2/concepts/distributed_architecture.html#co-located-functions On a separate topic, is anyone using StateFun in production? > > > > Thanks, > > Dan > > > > *From: *"Tzu-Li

Re: Native State in Python Stateful Functions?

2020-10-08 Thread Tzu-Li (Gordon) Tai
Hi, Nice to hear that you are trying out StateFun! It is by design that function state is attached to each HTTP invocation request, as defined by StateFun's remote invocation request-reply protocol. This decision was made with typical application cloud-native architectures in mind - having

Re: Stateful Functions + ML model prediction

2020-10-07 Thread Tzu-Li (Gordon) Tai
Kafka so that users can define them textually in `module.yaml` definition files, but this approach you pointed definitely works for the time being. Cheers, Gordon > > Cheers, > John. > > ------ > *From:* Tzu-Li (Gordon) Tai > *Sent:* Monday 5 October

Re: Statefun + Confluent Fully-managed Kafka

2020-10-07 Thread Tzu-Li (Gordon) Tai
Hi Hezekiah, I've confirmed that the Kafka properties set in the module specification file (module.yaml) are indeed correctly being parsed and used to construct the internal Kafka clients. StateFun / Flink does not alter or modify the properties. So, this should be something wrong with your

Re: Stateful Functions + ML model prediction

2020-10-05 Thread Tzu-Li (Gordon) Tai
Hi John, It is definitely possible to use Apache Pulsar with StateFun. Could you open a JIRA ticket for that? It would be nice to see how much interest we can gather on adding that as a new IO module, and consider adding native support for Pulsar in future releases. If you are already using

[ANNOUNCE] Apache Flink Stateful Functions 2.2.0 released

2020-09-28 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.2.0. Stateful Functions is an API that simplifies the building of distributed stateful applications with a runtime built for serverless architectures. It's based on functions with persistent

Re: Flink Stateful Functions API

2020-09-14 Thread Tzu-Li (Gordon) Tai
Hi! Dawid is right, there currently is no developer documentation for the remote request-reply protocol. One reason for this is that the protocol isn't considered a fully stable user-facing interface yet, and thus not yet properly advertised in the documentation. However, there are plans to

Re: State Storage Questions

2020-09-08 Thread Tzu-Li (Gordon) Tai
> > On Fri, Sep 4, 2020 at 1:20 AM Tzu-Li (Gordon) Tai > wrote: > >> Hi, >> >> On Fri, Sep 4, 2020 at 1:37 PM Rex Fenley wrote: >> >>> Hello! >>> >>> I've been digging into State Storage documentation, but it's left me >>&g

Re: State Storage Questions

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi, On Fri, Sep 4, 2020 at 1:37 PM Rex Fenley wrote: > Hello! > > I've been digging into State Storage documentation, but it's left me > scratching my head with a few questions. Any help will be much appreciated. > > Qs: > 1. Is there a way to use RocksDB state backend for Flink on AWS EMR? >

Re: FLINK YARN SHIP from S3 Directory

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi, As far as I can tell from a recent change [1], this seems to be possible now starting from Flink 1.11.x. Have you already tried this with the latest Flink version? Also including Klou in this email, who might be able to confirm this. Cheers, Gordon [1]

Re: Unit Test for KeyedProcessFunction with out-of-core state

2020-09-04 Thread Tzu-Li (Gordon) Tai
Hi Alexey, Is there a specific reason why you want to test against RocksDB? Otherwise, in Flink tests we use a `KeyedOneInputStreamOperatorTestHarness` [1] that allows you to wrap a user function and eliminate the need to worry about setting up heavy runtime context / dependencies such as the

Re: Updating kafka connector with state

2020-08-10 Thread Tzu-Li (Gordon) Tai
Hi Nikola, If I remember correctly, state is not compatible between flink-connector-kafka-0.11 and the universal flink-connector-kafka. Piotr (cc'ed) would probably know whats going on here. Cheers, Gordon On Mon, Aug 10, 2020 at 1:07 PM Nikola Hrusov wrote: > Hello, > > We are trying to

Re: MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-23 Thread Tzu-Li (Gordon) Tai
kpressuring by restricting the size > of the internal queue: > > // 200 Bytes per record, 1 shard > kinesis.setQueueLimit(500); > > > On Tue, Jul 21, 2020 at 8:00 PM Tzu-Li (Gordon) Tai > wrote: > >> Hi Vijay, >> >> I'm not entirely sure of the semantics between

Re: MaxConnections understanding on FlinkKinesisProducer via KPL

2020-07-21 Thread Tzu-Li (Gordon) Tai
Hi Vijay, I'm not entirely sure of the semantics between ThreadPoolSize and MaxConnections since they are all KPL configurations (this specific question would probably be better directed to AWS), but my guess would be that the number of concurrent requests to the KPL backend is capped by

Re: FlinkKinesisProducer blocking ?

2020-07-21 Thread Tzu-Li (Gordon) Tai
er second per >> shard are buffered in an unbounded queue and dropped when their RecordTtl >> expires. >> >> To avoid data loss, you can enable backpressuring by restricting the size >> of the internal queue: >> >> // 200 Bytes per record, 1 shard >> kine

Re: Validating my understanding of SHARD_DISCOVERY_INTERVAL_MILLIS

2020-07-21 Thread Tzu-Li (Gordon) Tai
Hi Vijay, Your assumption is correct that the discovery interval does not affect the interval of fetching records. As a side note, you can actually disable shard discovery, by setting the value to -1. The FlinkKinesisProducer would then only call ListShards once at job startup. Cheers, Gordon

Re: Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Ah, didn't realize Chesnay has it answered already, sorry for the concurrent reply :) -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Chaining the creation of a WatermarkStrategy doesn't work?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi, This would be more of a Java question. In short, type inference of generic types does not work for chained invocations, and therefore type information has to be explicitly included. If you'd like to chain the calls, this would work: WatermarkStrategy watermarkStrategy = WatermarkStrategy

Re: TaskManager docker image for Beam WordCount failing with ClassNotFound Exception

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi, Assuming that the job jar bundles all the required dependencies (including the Beam dependencies), making them available under `/opt/flink/usrlib/` in the container either by mounting or directly adding the job artifacts should work. AFAIK It is also the recommended way, as opposed to adding

Re: FlinkKinesisProducer blocking ?

2020-07-08 Thread Tzu-Li (Gordon) Tai
Hi Vijay, The FlinkKinesisProducer does not use blocking calls to the AWS KDS API. It does however apply backpressure (therefore effectively blocking all upstream operators) when the number of outstanding records accumulated exceeds a set limit, configured using the

Re: Any python example with json data from Kafka using flink-statefun

2020-06-16 Thread Tzu-Li (Gordon) Tai
(forwarding this to user@ as it is more suited to be located there) Hi Sunil, With remote functions (using the Python SDK), messages sent to / from them must be Protobuf messages. This is a requirement since remote functions can be written in any language, and we use Protobuf as a means for

Re: How To subscribe a Kinesis Stream using enhance fanout?

2020-06-09 Thread Tzu-Li (Gordon) Tai
RA ticket. > > Thanks > > > > ‐‐‐ Original Message ‐‐‐ > On Thursday, 14 May 2020 11:34, Xiaolong Wang > wrote: > > Thanks, I'll check it out. > > On Thu, May 14, 2020 at 6:26 PM Tzu-Li (Gordon) Tai > wrote: > >> Hi Xiaolong, >> >> You are right

Re: KeyedStream and keyedProcessFunction

2020-06-09 Thread Tzu-Li (Gordon) Tai
Hi, Records with the same key will be processed by the same partition. Note there isn't an instance of a keyed process function for each key. There is a single instance per partition, and all keys that are distributed to the same partition will get processed by the same keyed process function

[ANNOUNCE] Apache Flink Stateful Functions 2.1.0 released

2020-06-09 Thread Tzu-Li (Gordon) Tai
The Apache Flink community is very happy to announce the release of Apache Flink Stateful Functions 2.1.0. Stateful Functions is an API that simplifies building distributed stateful applications. It's based on functions with persistent state that can interact dynamically with strong consistency

Re: Suggestions for using both broadcast sync and conditional async-io

2020-06-04 Thread Tzu-Li (Gordon) Tai
Hi, For the initial DB fetch and state bootstrapping: That's exactly what the State Processor API is for, have you looked at that already? It currently does support bootstrapping broadcast state [1], so that should be good news for you. As a side note, I may be missing something, is broadcast

Re: StateFun remote/embedded polyglot example

2020-05-31 Thread Tzu-Li (Gordon) Tai
Hi, On Mon, Jun 1, 2020 at 5:47 AM Omid Bakhshandeh wrote: > Hi, > > I'm very confused about StateFun 2.0 new architecture. > > Is it possible to have both remote and embedded functions in the same > deployment? > Yes that is possible. Embedded functions simply run within the Flink StateFun

Re: Stateful functions Harness

2020-05-27 Thread Tzu-Li (Gordon) Tai
ly add a Flink source function as the ingress. If you want to use that directly in a normal application (not just execution in IDE with the Harness), you can define your ingesses/egresses by binding SourceFunctionSpec / SinkFunctionSpec. Please see how they are being used in the Harness class for example

Re: Stateful functions Harness

2020-05-27 Thread Tzu-Li (Gordon) Tai
Intellij > > Any work arounds? > > > > > On May 22, 2020, at 12:03 AM, Tzu-Li (Gordon) Tai > wrote: > > Hi, > > Sorry, I need to correct my comment on using the Kafka ingress / egress > with the Harness. > > That is actually doable, by adding an extra depen

Re: stateful-fun2.0 checkpointing

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, I replied to your question on this in your other email thread. Let us know if you have other questions! Cheers, Gordon On Sun, May 24, 2020, 1:01 AM C DINESH wrote: > Hi Team, > > 1. How can we enable checkpointing in stateful-fun2.0 > 2. How to set parallelism > > Thanks, > Dinesh. > >

Re: Stateful-fun-Basic-Hello

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, You're right, maybe the documentation needs a bit more directions there, especially for people who are newer to Flink. 1. How to increase parallelism There are two ways to do this. Either set the `parallelism.default` also in the flink-conf.yaml, or use the -p command line option when

Re: Stateful-fun-Basic-Hello

2020-05-25 Thread Tzu-Li (Gordon) Tai
Hi, It seems like you are trying to package your Stateful Functions app as a Flink job, and submit that to an existing cluster. If that indeed is the case, Stateful Functions apps have some required confogurations that need to be set via the flink-conf.yaml file for your existing cluster. Please

Re: Stateful functions Harness

2020-05-21 Thread Tzu-Li (Gordon) Tai
, such as the source / sink providers and Flink Kafka connectors. Cheers, Gordon On Fri, May 22, 2020 at 12:04 PM Tzu-Li (Gordon) Tai wrote: > Are you getting an exception from running the Harness? > The Harness should already have the required configurations, such as the > par

Re: Flink Window with multiple trigger condition

2020-05-21 Thread Tzu-Li (Gordon) Tai
Hi, To achieve what you have in mind, I think what you have to do is to use a processing time window of 30 mins, and have a custom trigger that matches the "start" event in the `onElement` method and return TriggerResult.FIRE_AND_PURGE. That way, the window fires either when the processing time

  1   2   3   4   5   6   >