[GitHub] [incubator-mxnet] andreas-solti opened a new issue #20220: Dynamic Batching during Inference / Runtime

GitBox Mon, 26 Apr 2021 09:51:16 -0700


andreas-solti opened a new issue #20220:
URL: https://github.com/apache/incubator-mxnet/issues/20220

First, thanks for creating this great and high performant framework! I've
looked in the open and closed issues and couldn't find this one.
## Description
It would be really cool to be able to enable automatic batching of inference
requests in the engine. The feature would dynamically wrap and unwrap
similar-sized inputs in the engine based on configured max wait time and
preferred batch size.
- Instead of adding each item to the queue for the engine to individually
process them, the engine would wrap data items in a batch and unwrap them after
computation
- The API is unchanged, but configuration settings are exposed to control
batch size and max wait time per batching instance.

## References
- dynamic batching example implementation:
https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#dynamic-batcher

## Expected Value
A large speedup is expected for practical use in high-load inference
settings, where many users need to be served.
When batching is implemented in the engine directly, it would be much faster
than the currently available (best?) solution with the multi-model-server.
Latter includes the overhead of a Java server + HTTP calls + Python-based
batching.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-mxnet] andreas-solti opened a new issue #20220: Dynamic Batching during Inference / Runtime

Reply via email to