Staging a PCollection in Beam | Dataflow Runner

Ravi Kapoor Tue, 18 Oct 2022 23:44:39 -0700

Hi Team,
Can we stage a PCollection<TableRows> or  PCollection<Row> data? Lets say
to save  the expensive operations between two complex BQ tables time and
again and materialize it in some temp view which will be deleted after the
session.


Is it possible to do that in the Beam Pipeline?
We can later use the temp view in another pipeline to read the data from
and do processing.

Or In general I would like to know Do we ever stage the PCollection.
Let's say I want to create another instance of the same job which has
complex processing.
Does the pipeline re perform the computation or would it pick the already
processed data in the previous instance that must be staged somewhere?

Like in spark we do have notions of createOrReplaceTempView which is used
to create temp table from a spark dataframe or dataset.

Please advise.

-- 
Thanks,
Ravi Kapoor
+91-9818764564
[email protected]

Staging a PCollection in Beam | Dataflow Runner

Reply via email to