Hi, Beam community!

We are working on an SDF to read from an unbounded data source that is a Spark 
streaming custom receiver [1]. The source Spark custom receiver [2] does not 
offer offset support. This introduces constraint for the Splittable DoFn 
approach because it won’t be able to read from multiple receivers in a worker – 
as they will all read the same data.


What are recommended practices to implement reading via SplittableDoFn if the 
streaming source doesn’t work with offsets? Could someone please share with us 
some thoughts or recommendations on this?

Thank you,
Elizaveta


[1] Spark Streaming Receiver – 
https://spark.apache.org/docs/latest/streaming-custom-receivers.html 
<https://spark.apache.org/docs/latest/streaming-custom-receivers.html>

[2] HubspotReceiver – 
https://github.com/data-integrations/hubspot/blob/develop/src/main/java/io/cdap/plugin/hubspot/source/streaming/HubspotReceiver.java

Reply via email to