Sorry all, I messed up the link/reference numbers. Here is the same e-mail with the reference numbers fixed.
I build off of the work performed by Eugene et al. within Breaking the fusion barrier[2] and propose[1] a way of how to support splitting of bundles (primarily for SplittableDoFn) within the portability layer. This also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting. Note that this proposal[1] discusses the portability API changes and "control" flow needed. It also discusses implementation details recommended during implementation by SDKs and runners. Interestingly, I believe there is a way to have a limited form of dynamic work rebalancing for all runners[8] that exist today that should be easily extensible by Runners to provide a meaningful solution but until implemented and tried out, hard to say what gains if any there could be. Note that follow-up proposals/discussions about any SplittableDoFn API changes specific to each language implementation should follow by those interested in getting SplittableDoFn working with portability. There are a few that are needed to support backlog reporting/splitting at backlog[7] and also bundle finalization[9]. This topic has a lot of historical context so I apologize upfront for the complicated read, but feel free to comment on the doc or this thread. 1: https://s.apache.org/beam-checkpoint-and-split-bundles 2: https://s.apache.org/splittable-do-fn 3: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html 4: http://s.apache.org/beam-breaking-fusion 5: http://s.apache.org/textio-sdf 6: http://s.apache.org/splittable-do-fn-python-sdk 7: https://s.apache.org/beam-bundles-backlog-splitting 8: https://docs.google.com/document/d/1cKOB9ToasfYs1kLWQgffzvIbJx2Smy4svlodPRhFrk4/edit#heading=h.wkwslng744mv 9: https://s.apache.org/beam-finalizing-bundles On Fri, Oct 26, 2018 at 3:07 PM Lukasz Cwik <[email protected]> wrote: > I build off of the work performed by Eugene et al. within Breaking the > fusion barrier[2] and propose[1] a way of how to support splitting of > bundles (primarily for SplittableDoFn) within the portability layer. This > also builds off of a lot of past work[3, 4, 5, 6, 7] related to splitting. > > Note that this proposal[1] discusses the portability API changes and > "control" flow needed. It also discusses implementation details recommended > during implementation by SDKs and runners. Interestingly, I believe there > is a way to have a limited form of dynamic work rebalancing for all > runners[8] that exist today that should be easily extensible by Runners to > provide a meaningful solution but until implemented and tried out, hard to > say what gains if any there could be. > > Note that follow-up proposals/discussions about any SplittableDoFn API > changes specific to each language implementation should follow by those > interested in getting SplittableDoFn working with portability. There are a > few that are needed to support backlog reporting/splitting at backlog[6] > and also bundle finalization[9]. > > This topic has a lot of historical context so I apologize upfront for the > complicated read, but feel free to comment on the doc or this thread. > > 1: https://s.apache.org/splittable-do-fn > 2: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html > 3: http://s.apache.org/beam-breaking-fusion > 4: http://s.apache.org/textio-sdf > 5: http://s.apache.org/splittable-do-fn-python-sdk > 6: https://s.apache.org/beam-bundles-backlog-splitting > 7: https://s.apache.org/beam-checkpoint-and-split-bundles > 8: > https://docs.google.com/document/d/1cKOB9ToasfYs1kLWQgffzvIbJx2Smy4svlodPRhFrk4/edit#heading=h.wkwslng744mv > 9: https://s.apache.org/beam-finalizing-bundles >
