Need Help

2022-03-28 Thread Himanshu Sareen
Team,

Is it possible for two independent flink-statefun applications can communicate 
via Http Rest API ?
In other words does flink-statefun support HTTP REST as Ingress ?

We are using python sdk for statefun application.

Regards,
Himanshu


HTTP REST API as Ingress/Egress

2022-04-23 Thread Himanshu Sareen
Team,

Does flink-statefun support HTTP REST as Ingress ( like Kafka and kinesis )

I'm looking for a fault tolerant solution where an external API can invoke 
stateful function , access state and return response.

We are using python sdk for statefun application

Regards,
Himanshu



Flink Stateful Function - Regex Match in State Key

2022-05-19 Thread Himanshu Sareen
Hi All,

My understanding is Flink uses exact match on key to fetch/load state in a 
stateful functions.

But is it possible to use a regex expression in target-id or as a key to a 
stateful function, thus fetching/loading all matching states.

Regards,
Himanshu


Re: HTTP REST API as Ingress/Egress

2022-05-19 Thread Himanshu Sareen
Hi All,


It will be of great help if someone can share views.

As per application design. Synchronous access to a stateful fucntion.

  1.  Application will access/invoke a stateful function via a HTTP call.
  2.  Application will wait for an response.
  3.  Once Stateful function completes the execution return the response back 
to the Application.

Regards
Himanshu

From: Himanshu Sareen
Sent: Sunday, April 24, 2022 6:59 AM
To: user@flink.apache.org 
Subject: HTTP REST API as Ingress/Egress

Team,

Does flink-statefun support HTTP REST as Ingress ( like Kafka and kinesis )

I'm looking for a fault tolerant solution where an external API can invoke 
stateful function , access state and return response.

We are using python sdk for statefun application

Regards,
Himanshu



Flink DataStream and remote Stateful Functions interoperability

2022-05-24 Thread Himanshu Sareen
Team,

I'm working on a POC where our existing Stateful Functions ( remote ) can 
interact with Datastream API.
https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/sdk/flink-datastream/

I started Flink cluster - ./bin/start-cluster.sh
Then I submitted the .jar to Flink.

However, on submitting only Embedded function is called by Datastream code.

I'm unable to invoke stateful functions as module.yaml is not loaded.

Can someone help me in understanding how can we deploy Stateful function code 
(module.yaml) and Datastream api code parllely on Flink cluster.


Regards
Himanshu



Re: Flink DataStream and remote Stateful Functions interoperability

2022-05-31 Thread Himanshu Sareen
Tanks Tymur, for the pointers.


I followed the GitHub link and got the understanding on how to define and 
configure Remote Functions with Datastream Api

However, I need help in understanding the following:

1. I didn't find Stateful function definition and server code.
2. How should we deploy Stateful Function code

If you can please help with pointers on Deployment of Stateful function code 
and the GitHub code.

Regards
Himanshu

From: Tymur Yarosh 
Sent: Wednesday, May 25, 2022 9:53:20 PM
To: user@flink.apache.org ; Himanshu Sareen 

Subject: Re: Flink DataStream and remote Stateful Functions interoperability

Hi Himanshu,

The short answer is you should configure Stateful Functions in your job. Here 
is an example 
https://github.com/f1xmAn/era-locator/blob/34dc4f77539195876124fe604cf64c61ced4e5da/src/main/java/com/github/f1xman/era/StreamingJob.java#L68.

Check out this article on Flink DataStream and Stateful Functions 
interoperability 
https://medium.com/devoops-and-universe/realtime-detection-of-russian-crypto-phone-era-with-flink-datastream-and-stateful-functions-e77794fedc2a.

Best,
Tymur Yarosh
On 24 May 2022, 21:16 +0300, Himanshu Sareen , 
wrote:
Team,

I'm working on a POC where our existing Stateful Functions ( remote ) can 
interact with Datastream API.
https://nightlies.apache.org/flink/flink-statefun-docs-release-3.2/docs/sdk/flink-datastream/

I started Flink cluster - ./bin/start-cluster.sh
Then I submitted the .jar to Flink.

However, on submitting only Embedded function is called by Datastream code.

I'm unable to invoke stateful functions as module.yaml is not loaded.

Can someone help me in understanding how can we deploy Stateful function code 
(module.yaml) and Datastream api code parllely on Flink cluster.


Regards
Himanshu



Flink Remote Stateful - Crontab Scheduler

2022-06-08 Thread Himanshu Sareen
Team,


We are aware about Sending Delayed 
Messages
 which we can use to trigger delayed stateful functions.

Example Use Case

Invoke a Remote Stateful Function every day @ 11 AM


We are using Python API and didn't came across out-of-box support for Crontab 
like scheduling.

If anyone can share views what will be the best/optimized solution.

Regards,
Himanshu


Apache Flink - Reading data from Scylla DB

2022-06-13 Thread Himanshu Sareen
Team,

I'm looking for a solution to Consume/Read data from Scylla DB into Apache 
Flink.

If anyone can guide me or share pointers it will be helpful.

Regards,
Himanshu


Flink-Statefun : Design Solution advice ( ASYNC IO)

2022-08-16 Thread Himanshu Sareen
Team,

I'm solving a use-case and needs advice/suggestions if Flink is right choice 
for solution.

1. Ingress - Kafka events/messages consist of multiple IDs.
2. Task - For each ID in a Kafka message query Cassandra DB ( asynchronously) 
to fetch records. Prepare multiple small messages out of the records received.
3. Egress - Emit all the small messages/event to Kafka.

Ingress Event may contain more than 500 IDs
Each ID may result into Millions of Records from Cassandra DB.


My Design Approach

1. Remote Stateful Function listens to Ingress.
2. Invokes Embedded function ( which implements ASYNC IO ) for each ID.
3. Once records are available from Embedded Function, process them and emits 
multiple events to Kafka.
4. Send back an acknowledgement to calling remote Function.

Please share suggestions or advice.

Note - I'm unable to find a good example for embedded and remote Function 
interaction.

Regards


Re: Flink-Statefun : Design Solution advice ( ASYNC IO)

2022-08-16 Thread Himanshu Sareen
Hi Tymur,

1. Why do you want a remote function to call an embedded function in this case?

To Implement an Async IO call. 
https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/sdk/flink-datastream/#completing-async-requests

2. Do you have separate outgoing Kafka messages or join them per record?

1 DB query will fetch Millions records which will be aggregated into 
multiple separate events/messages to be published to Kafka.

3. What is explicit acknowledgment at the end for?

 As 1 Ingress event may have > 500 IDs. So once entire events has been 
processed ( i.e. DB query is completed for all 500 IDs ) we have to send a 
Trigger Event for Down-Stream consumers.


Also If there is better way to code this problem do share some pointers.

Regards,
Himanshu

From: Tymur Yarosh 
Sent: Tuesday, August 16, 2022 2:37 PM
To: user@flink.apache.org ; Himanshu Sareen 

Subject: Re: Flink-Statefun : Design Solution advice ( ASYNC IO)

Hi,
Just a few questions:
1. Why do you want a remote function to call an embedded function in this case?
2. Do you have separate outgoing Kafka messages or join them per record?
3. What is explicit acknowledgment at the end for?

Best,
Tymur Yarosh
16 серп. 2022 р., 11:48 +0300, Himanshu Sareen , 
писав(-ла):
Team,

I'm solving a use-case and needs advice/suggestions if Flink is right choice 
for solution.

1. Ingress - Kafka events/messages consist of multiple IDs.
2. Task - For each ID in a Kafka message query Cassandra DB ( asynchronously) 
to fetch records. Prepare multiple small messages out of the records received.
3. Egress - Emit all the small messages/event to Kafka.

Ingress Event may contain more than 500 IDs
Each ID may result into Millions of Records from Cassandra DB.


My Design Approach

1. Remote Stateful Function listens to Ingress.
2. Invokes Embedded function ( which implements ASYNC IO ) for each ID.
3. Once records are available from Embedded Function, process them and emits 
multiple events to Kafka.
4. Send back an acknowledgement to calling remote Function.

Please share suggestions or advice.

Note - I'm unable to find a good example for embedded and remote Function 
interaction.

Regards


Re: Flink-Statefun : Design Solution advice ( ASYNC IO)

2022-08-16 Thread Himanshu Sareen
Thanks, Tymor. I think the Datastream API job will also work for the use-case. 
I'll try a quick POC.

Further, Can you suggest how should we fetch data from External API/DB from a 
Remote Stateful Function ( Pythion sdk ).
We have a Flink-Statefun application which listens to Kafka Streams and 
populate/hydrate them with data ( hosted on external DB/Servers ).

Presently we are making API/DB call from within Stateful Function. But how 
should we handle the following scenarios:

  1.  API/DB service down
  2.  Timeouts/Re-try


Regards,
Himanshu



Get Outlook for Android<https://aka.ms/AAb9ysg>

From: Tymur Yarosh 
Sent: Tuesday, August 16, 2022 5:22:23 PM
To: user@flink.apache.org ; Himanshu Sareen 

Subject: Re: Flink-Statefun : Design Solution advice ( ASYNC IO)

Ok, I don't see it as a good case for Stateful Functions. Statefun is a 
fantastic tool when you treat each function as an object with its data and 
behavior, and these objects communicate with each other in arbitrary 
directions. Also, Stateful Functions remote API is asynchronous, so you can use 
it to enrich some data.

While I have a limited understanding of your problem, it looks like a classic 
DAG that can be implemented using pure Flink DataStream API.

For instance:
1. Kafka source ->
2. keyBy your id ->
3. FlatMap to split Kafka messages to message per id and also attach the total 
count of messages to each downstream message ->
4. RichAsyncMessage to enrich id and count with the data from Cassandra ->
5. process with a function that:
5.1. Sends incoming messages downstream and increases the count of seen messages
5.2. Compares the count of seen messages to the total count. If they equal 
sends ack to the side output and clears the state
5.3. If there are some more messages expected and there were nothing during 
some threshold duration, sends an error to another side output
6. Three sinks — one for enriched outgoing messages, the other one for acks 
from side output, and the last one for errors.

Or you can even combine acks and errors to the same sink.

Works for you?

Best,
Tymur Yarosh
16 серп. 2022 р., 13:40 +0300, Himanshu Sareen , 
писав(-ла):
Hi Tymur,

1. Why do you want a remote function to call an embedded function in this case?

To Implement an Async IO call. 
https://nightlies.apache.org/flink/flink-statefun-docs-master/docs/sdk/flink-datastream/#completing-async-requests

2. Do you have separate outgoing Kafka messages or join them per record?

1 DB query will fetch Millions records which will be aggregated into 
multiple separate events/messages to be published to Kafka.

3. What is explicit acknowledgment at the end for?

 As 1 Ingress event may have > 500 IDs. So once entire events has been 
processed ( i.e. DB query is completed for all 500 IDs ) we have to send a 
Trigger Event for Down-Stream consumers.


Also If there is better way to code this problem do share some pointers.

Regards,
Himanshu

From: Tymur Yarosh 
Sent: Tuesday, August 16, 2022 2:37 PM
To: user@flink.apache.org ; Himanshu Sareen 

Subject: Re: Flink-Statefun : Design Solution advice ( ASYNC IO)

Hi,
Just a few questions:
1. Why do you want a remote function to call an embedded function in this case?
2. Do you have separate outgoing Kafka messages or join them per record?
3. What is explicit acknowledgment at the end for?

Best,
Tymur Yarosh
16 ÓÅÒÐ. 2022 Ò., 11:48 +0300, Himanshu Sareen , 
ÐÉÓÁ×(-ÌÁ):
Team,

I'm solving a use-case and needs advice/suggestions if Flink is right choice 
for solution.

1. Ingress - Kafka events/messages consist of multiple IDs.
2. Task - For each ID in a Kafka message query Cassandra DB ( asynchronously) 
to fetch records. Prepare multiple small messages out of the records received.
3. Egress - Emit all the small messages/event to Kafka.

Ingress Event may contain more than 500 IDs
Each ID may result into Millions of Records from Cassandra DB.


My Design Approach

1. Remote Stateful Function listens to Ingress.
2. Invokes Embedded function ( which implements ASYNC IO ) for each ID.
3. Once records are available from Embedded Function, process them and emits 
multiple events to Kafka.
4. Send back an acknowledgement to calling remote Function.

Please share suggestions or advice.

Note - I'm unable to find a good example for embedded and remote Function 
interaction.

Regards


Query/Issues - Statefun - Flink

2022-09-23 Thread Himanshu Sareen
Team,


I'm looking for an flink documentation which addresses/documents the comparison 
of Remote functions/Embedded/Data Stream.

I'm presently exploring Flink Remote State-fun functions and looking for a 
clarity on the following:


  1.  send_egress - Is there a way to get a callback registered to get ack/fail 
if message is published to Kafka or not ?
  2.  Does State-fun supports exactly-once for Kafka Egress ? ( I got duplicate 
records with this [cid:ee0fb95f-b723-437c-b2f9-10f69f860d23] )
  3.  If Kafka (egress) is down/un-reachable then how is this handled by Flink 
? ( retries or messages are dropped ? )
  4.  When backpressure occurs Flink restarts as checkpoints start to fail. I 
tried unalligned checkpoint but got this : Invalid configuration: 
execution.checkpointing.unaligned; StateFun currently does not support 
unaligned checkpointing
  5.
  6.
  7.  Also does Datastream API or Embedded will also have similar issues ?

Regards,
Himanshu