Fwd: Flink Checkpoint & Offset Commit

2024-03-07 Thread Jacob Rollings
Hello,

I am implementing proof of concepts based Flink realtime streaming
solutions.

I came across below lines in out-of-the-box Flink Kafka connector documents.


*https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/kafka/*


*Consumer Offset Committing #
*

*Kafka source commits the current consuming offset when checkpoints
are completed, for ensuring the consistency between Flinkā€™s checkpoint
state and committed offsets on Kafka brokers*.


How is Flink able to control the callbacks from checkpointing? Is there a
way to override this into my implementations. I have multiple upstream
sources to connect to depending on the business model which are not Kafka.
Based on criticality of the system and publisher dependencies, we cannot
switch to Kafka for these. So I was hoping to do the same which kafka
connector is doing.


Cheers,

JR


Global connection open and close

2024-03-21 Thread Jacob Rollings
Hello,

Is there a way in Flink to instantiate or open connections (to cache/db) at
global level, so that it can be reused across many process functions rather
than doing it in each operator's open()?Along with opening, also wanted to
know if there is a way to close them at job level stop, such that they are
closed at the very end after each operator close() method is complete.
Basically the idea is to maintain a single instance at global level and
close its session as a last step after each opertor close is complete.


Regards,
JB


Async code inside Flink Sink

2024-04-17 Thread Jacob Rollings
Hello,

I have a use case where I need to do a cache file deletion after a
successful sunk operation(writing to db). My Flink pipeline is built using
Java. I am contemplating using Java completableFuture.runasync() to perform
the file deletion activity. I am wondering what issues this might cause in
terms of thread management and next event processing.

Also related to the same usecase, in another Flink pipeline. I might need
to do this cache file deletion in a timed fashion. For example, every five
minutes  I have to check for cache files that are older than currently
opened cache file that is serving some data into the Sink function. All old
cache files that are in closed status need to be deleted in a timely manner.

All this deletion has to happen asynchronously without blocking the flow of
events from Flink source to sink.
Also, based on requirement, I cannot make the whole Sink operation async. I
have to make the file based cache  deletion alone async inside the Sink
function.

Does Flink support timers or async blocks?

Any inputs  will be highly helpful.

Thanks,
JB


Checkpointing

2024-05-08 Thread Jacob Rollings
Hello,

I'm curious about how Flink checkpointing would aid in recovering data if
the data source is not Kafka but another system. I understand that
checkpoint snapshots are taken at regular time intervals.

What happens to the data that were read after the previous successful
checkpoint if the system crashes before the next checkpointing interval,
especially for data sources that are not Kafka and don't support offset
management?

-
JB