Thanks all for your feedback. I've been looking into the batched DoFns, and will have a follow up on how we can best interact with them.
On Mon, Aug 15, 2022 at 7:16 PM Robert Bradshaw <rober...@google.com> wrote: > Thanks. I added some comments to the doc. > > I agree with Brian that it makes sense to figure out how this > interacts with batched DoFns, as we'll want to migrate to that. > (Perhaps they're already ready to migrate to as a first step?) > > On Fri, Aug 12, 2022 at 1:03 PM Brian Hulette via dev > <dev@beam.apache.org> wrote: > > > > Hi Andy, > > > > Thanks for writing this up! This seems like something that Batched DoFns > could help with. Could we make a BatchConverter [1] that represents the > necessary transformations here, and define RunInference as a Batched DoFn? > Note that the Numpy BatchConverter already enables users to specify a batch > dimension using a custom typehint, like NumpyArray[np.int64, (N, 10)] (the > N identifies the batch dimension) [2]. I think we could do something > similar, but with pytorch types. It's likely we'd need to define our own > typehints though, I suspect pytorch typehints aren't already parameterized > by size. > > > > Brian > > > > > > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/batch.py > > [2] > https://github.com/apache/beam/blob/3173b503beaf30c4d32a4a39c709fd81e8161907/sdks/python/apache_beam/typehints/batch_test.py#L42 > > > > On Fri, Aug 12, 2022 at 12:36 PM Andy Ye via dev <dev@beam.apache.org> > wrote: > >> > >> Hi everyone, > >> > >> I've written up a design doc [1] on controlling batching in > RunInference. I'd appreciate any feedback. Thanks! > >> > >> Summary: > >> Add a custom stacking function to RunInference to enable users to > control how they want their data to be stacked. This addresses issues > regarding data that have existing batching dimensions, or different sizes. > >> > >> Best, > >> Andy > >> > >> [1] > https://docs.google.com/document/d/1l40rOTOEqrQAkto3r_AYq8S_L06dDgoZu-4RLKAE6bo/edit# >