Hello Beam list!

We are looking at adopting some more advanced use cases with Beam code at
its core including automated testing and data dependency tracking.

Specifically I'm interested in things like making sure data changes don't
break pipelines, or things that depend on pipeline output, especially if
the Beam code isn't managed by the same team that is producing the data or
the systems that consume the Beam output.

This becomes more complex if you consider certain runners with non-zero
replacement time doing a rolling or staged restart/upgrade/replacement that
depend on data producers that ALSO have non-zero replacement time. Are
there any best practices for Beam code management / data dependency
management when the code in /master is not necessarily what is running live
in your production systems? Is it all just "pretend all data is bad and try
to be backwards compatible", or are there any Beam features that help with
this?

Thanks,
Charles Allen

Reply via email to