Hi Chad! I've been meaning to review this, I've just not carved up the time. I'll try to get back to you this week with some thoughts! Thanks! -P.
On Wed, Dec 2, 2020 at 10:31 AM Chad Dombrova <chad...@gmail.com> wrote: > Hi everyone, > Beam's niche is low latency, high throughput workloads, but Beam has > incredible promise as an orchestrator of long running work that gets sent > to a scheduler. We've created a modified version of Beam that allows the > python SDK worker to outsource tasks to a scheduler, like Kubernetes > batch jobs[1], Argo[2], or Google's own OpenCue[3]. > > The basic idea is that any element in a stream can be tagged to be > executed outside of the normal SdkWorker as an atomic "task". A task is > one invocation of a stage, composed of one or more DoFns, against one a > slice of the data stream, composed of one or more tagged elements. The > upshot is that we're able to slice up the processing of a stream across > potentially *many* workers, with the trade-off being the added overhead > of starting up a worker process for each task. > > For more info on how we use our modified version of Beam to make visual > effects for feature films, check out the talk[4] I gave at the Beam Summit. > > Here's our design doc: > > https://docs.google.com/document/d/1GrAvDWwnR1QAmFX7lnNA7I_mQBC2G1V2jE2CZOc6rlw/edit?usp=sharing > > And here's the github branch: > https://github.com/LumaPictures/beam/tree/taskworker_public > > Looking forward to your feedback! > -chad > > > [1] https://kubernetes.io/docs/concepts/workloads/controllers/job/ > [2] https://argoproj.github.io/ > [3] https://cloud.google.com/opencue > [4] https://www.youtube.com/watch?v=gvbQI3I03a8&ab_channel=ApacheBeam > >