Hallo
We are running LR[1] and GBDT[2] and similar algorithm in MP2 handles.
For each request, there were about 1000 features as arguments passed
into the handles, via HTTP POST.
The request will wait for about 100ms to get responses, coz the
calculation is not cheap.
My question is, how can we improve the throughput by architecture
optimization?
Yes we know there are TFS[3] and RT[4] for prediction frameworks, but we
didn't use Tensorflow yet.
[1] https://en.wikipedia.org/wiki/LR_parser
[2] https://en.wikipedia.org/wiki/Gradient_boosting
[3] https://www.tensorflow.org/tfx/guide/serving
[4] https://developer.nvidia.com/tensorrt
Thanks.