Re: Staging a PCollection in Beam | Dataflow Runner

Reuven Lax via dev Wed, 19 Oct 2022 08:24:01 -0700

PCollections's usually are persistent within a pipeline, so you can reuse
them in other parts of a pipeline with no problem.


There is no notion of state across pipelines - every pipeline is
independent. If you want state across pipelines you can write the
PCollection out to a set of files which are read back in in the new
pipeline.

On Tue, Oct 18, 2022 at 11:45 PM Ravi Kapoor <[email protected]> wrote:

> Hi Team,
> Can we stage a PCollection<TableRows> or  PCollection<Row> data? Lets say
> to save  the expensive operations between two complex BQ tables time and
> again and materialize it in some temp view which will be deleted after the
> session.
>
> Is it possible to do that in the Beam Pipeline?
> We can later use the temp view in another pipeline to read the data from
> and do processing.
>
> Or In general I would like to know Do we ever stage the PCollection.
> Let's say I want to create another instance of the same job which has
> complex processing.
> Does the pipeline re perform the computation or would it pick the already
> processed data in the previous instance that must be staged somewhere?
>
> Like in spark we do have notions of createOrReplaceTempView which is used
> to create temp table from a spark dataframe or dataset.
>
> Please advise.
>
> --
> Thanks,
> Ravi Kapoor
> +91-9818764564 <+91%2098187%2064564>
> [email protected]
>

Re: Staging a PCollection in Beam | Dataflow Runner

Reply via email to