TheNeuralBit commented on issue #21440:
URL: https://github.com/apache/beam/issues/21440#issuecomment-1239777760

   Spoke with @yeandy about this today. We discussed how to implement a pytorch 
BatchConverter (should mostly be a copy paste job from the numpy one), and how 
to port RunInference over to using Batched DoFns, while maintaining backward 
compatibility guarantees.
   
   One backward compatibility concern is each ModelHandler's public API. Each 
ModelHandler will likely need to add arguments for users to specify 
input/output typehints in ModelHandlers. We could make these new arguments 
backwards compatible by defining default values that preserve the existing 
behavior (e.g. in pytorch, the default batch input type will have 
   
   Another  backward compatibility concern is changing RunInference DoFn to 
implement process_batch, while still supporting existing ModelHandler 
implementations. We could likely do this by augmenting the ModelHandler API 
s.t. base RunInference can use either the conventional approach (BatchElements  
+ process), or the Batched DoFn approach (process_batch with dynamic typehints).
   
   Another potential feature Andy raised in his [dev@ thread and 
doc](https://lists.apache.org/thread/rrjb4h451oyhygln87j6oq51hjy2r1tv) is 
enabling merging already batched inputs (e.g. np.concatenate rather than 
np.stack - the latter creates a new dimension, where as the former concatenates 
across an existing one). Ultimately we should be able to leverage 
`combine_batches` for this:
   
https://github.com/apache/beam/blob/0d937d4cd725965572d4720811fa2d6efaa8edf8/sdks/python/apache_beam/typehints/batch.py#L212-L213
   
   but some work still needs to be done there (e.g. we need a way for users to 
declare how big they'd like their batches to be).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to