[GitHub] [beam] TheNeuralBit commented on issue #21440: Hook In Batching DoFn Apis to RunInference

GitBox Wed, 07 Sep 2022 12:15:03 -0700


TheNeuralBit commented on issue #21440:
URL: https://github.com/apache/beam/issues/21440#issuecomment-1239777760

Spoke with @yeandy about this today. We discussed how to implement a pytorch
BatchConverter (should mostly be a copy paste job from the numpy one), and how
to port RunInference over to using Batched DoFns, while maintaining backward
compatibility guarantees.

One backward compatibility concern is each ModelHandler's public API. Each
ModelHandler will likely need to add arguments for users to specify
input/output typehints in ModelHandlers. We could make these new arguments
backwards compatible by defining default values that preserve the existing
behavior (e.g. in pytorch, the default batch input type will have

Another backward compatibility concern is changing RunInference DoFn to
implement process_batch, while still supporting existing ModelHandler
implementations. We could likely do this by augmenting the ModelHandler API
s.t. base RunInference can use either the conventional approach (BatchElements
+ process), or the Batched DoFn approach (process_batch with dynamic typehints).

Another potential feature Andy raised in his [dev@ thread and
doc](https://lists.apache.org/thread/rrjb4h451oyhygln87j6oq51hjy2r1tv) is
enabling merging already batched inputs (e.g. np.concatenate rather than
np.stack - the latter creates a new dimension, where as the former concatenates
across an existing one). Ultimately we should be able to leverage
`combine_batches` for this:

https://github.com/apache/beam/blob/0d937d4cd725965572d4720811fa2d6efaa8edf8/sdks/python/apache_beam/typehints/batch.py#L212-L213

but some work still needs to be done there (e.g. we need a way for users to
declare how big they'd like their batches to be).

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] TheNeuralBit commented on issue #21440: Hook In Batching DoFn Apis to RunInference

Reply via email to