Hi,

Though there is no inherent limitation to the number of requests that a 
server can handle, the sync processing model has not been the target for 
performance optimizations. The Async model would be better suited for 
performance optimizations, and you might be able to increase your 
throughput.

On Tuesday, May 19, 2020 at 8:14:27 AM UTC-7 dyla...@gmail.com wrote:

> I am designing a neural network inference server and I have built my 
> server and client using a synchronous grpc model, with a unary RPC design. 
> For reference, the protobuf formats are based on the Nvidia Triton 
> Inference server formats https://github.com/NVIDIA/triton-inference-server. 
> My design expects a large batch of inputs (16384, for a total size of 1MB)  
> to be received by the server, the inference to be run, and then the result 
> to be returned to the client. I send these inputs in a repeated bytes field 
> in my protobuf. However, even if I make my server-side function simply 
> return an OK status (no actual processing), I find that the server can only 
> process ~1500-2000 batches of inputs per second (this is run with both 
> server and client on the same machine so network limitations should not be 
> relevant). However, I know that my inference processing can handle 
> throughputs closer to 10000 batches/second.
>
> Is there an inherent limitation to the number of requests that a gRPC 
> server can handle per second? Is there a server setting or design change I 
> can make to increase this maximum throughput?
>
> I am happy to provide more information if it can help in understanding my 
> issue.
>
> Thanks for your help,
>
> -Dylan
>

-- 
You received this message because you are subscribed to the Google Groups 
"grpc.io" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to grpc-io+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/grpc-io/3ee4cb9c-c4ed-4f74-938a-2e61e401252dn%40googlegroups.com.

Reply via email to