Re: Error handling for GCP Pub/Sub on Dataflow using Python

2024-05-25 Thread XQ Hu via user
I do not suggest you handle this in beam.io.WriteToPubSub. You could change
your pipeline to add one transform to check the message size. If it is
beyond 10 MB, you could use another sink or process the message to reduce
the size.

On Fri, May 24, 2024 at 3:46 AM Nimrod Shory  wrote:

> Hello group,
> I am pretty new to Dataflow and Beam.
> I have deployed a Dataflow streaming job using Beam with Python.
> The final step of my pipeline is publishing a message to Pub/Sub.
> In certain cases the message can become too big for Pub/Sub (larger than
> the allowed 10MB) and in that case of failure, it just retries to publish
> indefinitely, causing the Job to eventually stop processing new data.
>
> My question is, is there a way to handle failures in beam.io.WriteToPubSub
> or should I implement a similar method myself?
>
> Ideally, I would like to write the too large messages to a file on cloud
> storage.
>
> Any ideas will be appreciated.
>
> Thanks in advance for your help!
>
>


Error handling for GCP Pub/Sub on Dataflow using Python

2024-05-24 Thread Nimrod Shory
Hello group,
I am pretty new to Dataflow and Beam.
I have deployed a Dataflow streaming job using Beam with Python.
The final step of my pipeline is publishing a message to Pub/Sub.
In certain cases the message can become too big for Pub/Sub (larger than
the allowed 10MB) and in that case of failure, it just retries to publish
indefinitely, causing the Job to eventually stop processing new data.

My question is, is there a way to handle failures in beam.io.WriteToPubSub
or should I implement a similar method myself?

Ideally, I would like to write the too large messages to a file on cloud
storage.

Any ideas will be appreciated.

Thanks in advance for your help!