Re: Parallel Inference Proposal

Hagay Lupesko Thu, 10 May 2018 21:13:23 -0700

Good suggestion Kellen!

I like the idea, it will solve an existing deficiency in MXNet, that has
been worked around so far. As an example, the recently added Scala
inference API (part of 1.2RC) implemented a dispatcher in Scala to
workaround that limitation.


Would be great to better understand the changes you are planning in finer
details though.

Hagay

On Thu, May 10, 2018 at 7:42 AM, kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> Hello MXNet developers,
>
>
>
> I’ve recently been speaking with users who’d like to run parallel inference
> requests with MXNet on their service.  They’ll do this on GPUs, and due to
> resource constraints, they’d like to do this without duplicating their
> model’s weights in memory.  They’d also like run inference with a low
> degree of buffering/batching as latency is important.  I’ve created a wiki
> page with a small proposal that I hope will make running parallel inference
> a little easier.  I’d like to discuss the proposal in this thread and would
> particularly appreciate it if core devs could correct me if I’ve made any
> incorrect assumptions in the doc.
>
>
> Proposal here:
> https://cwiki.apache.org/confluence/display/MXNET/
> Parallel+Inference+in+MXNet
>
>
>
> If people are OK with the proposal I can open a Jira ticket, PR, etc.  If
> people are curious about perf implications I can also do some benchmarking.
>
>
>
> Thanks in advance for the feedback,
>
> -Kellen
>

Re: Parallel Inference Proposal

Reply via email to