Hi,

we are following [*Pattern: Slowly-changing lookup cache*] from
https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1

We have a use case to read slowly changing bounded data as a PCollection
along with the main PCollection from Kafka(windowed) and use it in the
query of BeamSql.

Is it possible to design such a use case with Beam Java SDK?

Approaches followed but not Successful:

1) GenerateSequence => GlobalWindow with Data Trigger => Composite
Transform(which applies Beam I/O on the
pipeline[PCollection.getPipeline()]) => Convert the resulting PCollection
to PCollection<Row> Apply BeamSQL
Comments: Beam I/O reads data only once even though a long value is
generated from GenerateSequece with periodicity. The expectation is that
whenever a long value is generated, Beam I/O will be used to read the
latest data. Is this because of optimizations in the DAG? Can the
optimizations be overridden?

2) The pipeline is the same as approach 1. But, instead of using a
composite transform, a DoFn is used where a for loop will emit each Row of
the PCollection.
comments: The output PCollection is unbounded. But, we need a bounded
PCollection as this PCollection is used to JOIN with PCollection of each
window from Kafka. How can we convert an Unbounded PCollection to Bounded
PCollection inside a DoFn?

Are there any better Approaches?

Regards,
Rahul

Reply via email to