[GitHub] [incubator-mxnet] apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0

GitBox Fri, 29 Mar 2019 11:10:25 -0700

apeforest opened a new issue #14569: performance degradation in model inference 
from 1.3.1 to 1.4.0
URL: https://github.com/apache/incubator-mxnet/issues/14569
 
 
   There seems to be a regression on resnet-18 model inference time (When 
running on GPU) post this PR, this was caught in MMS. nightly runs, the changes 
in this PR seem to be causing this issue.
   
   **Setup**
   
   We use MMS docker images to run load tests, we can start a local container 
using the following command.
   
   ```bash
   nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081  -itd 
awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu
   ```
   
   for building MXNet opencv 3.2 and CUDA 9.2 were used. 
   
   Load testing was done using locust, to install locust
   ```bash
   pip install locust
   ```
   
   # Download Test image
   ```bash
   curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
   ```
   The locust script for load testing
   ```python
   # test_resnet_!8.py
   from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task
   import urllib
   import os
   data = None
   with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data:
       data = data.read()
   
   class PredictionTasks(TaskSet):
       @task
       def inference(self):
           self.client.post("/predictions/resnet-18", 
data=data,headers={'Content-Type': 'image/jpeg'})
   
   class Prediction(HttpLocust):
       task_set = PredictionTasks
       min_wait = 100
       max_wait = 100
   ```
   
   **Running Load test**
   
   Registering and loading model
   
   ```bash
   # Register and load resnet-18 model archive
    curl -X POST 
127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar
   ```
   
   Start a single worker and run latency test
   
   ```bash
   Start worker and latency test
   $ curl -X PUT 
'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
   $ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 
--no-web  -c 1 -r 1 -t 20s --only-summary
   ```
   
   **To change mxnet version/build in docker image,** 
   
   **NOTE** By default recent pip version is pulled.
   
   ```bash
   # Go into docker image
   nvidia-docker exec -u root -it mms_benchmark_gpu bash
   $  pip uninstall mxnet-cu92mkl
   $ pip install <new-build>.whl
   ctrl + p + q to quit docker image
   
   # Destroy existing worker, and create new worker, this loads in newly 
installed mxnet
   $ curl -X PUT 
'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true'
   $ curl -X PUT 
'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true'
   ```
   
   **Results**
   
   on mxnet-cu92==1.3.0post0
   
   ```bash
   # locust result
    Name                                                          # reqs      # 
fails     Avg     Min     Max  |  Median   req/s
   
--------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                      152     
0(0.00%)      31      30      39  |      31    7.60
   
--------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                            152     
0(0.00%)                                       7.60
   
   Percentage of the requests completed within given times
    Name                                                           # reqs    
50%    66%    75%    80%    90%    95%    98%    99%   100%
   
--------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                       152     
31     31     31     31     32     33     33     34     280
   
--------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                             152     
31     31     31     31     32     33     33     34     280
   ```
   On mxnet-cu92 with commit 
https://github.com/apache/incubator-mxnet/commit/f9f74169bb05f85d85dec5991aa5fc9050dec9f6
 
   
   ```bash
    Name                                                          # reqs      # 
fails     Avg     Min     Max  |  Median   req/s
   
--------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                      141     
0(0.00%)      41      37     337  |      38    7.20
   
--------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                            141     
0(0.00%)                                       7.20
   
   Percentage of the requests completed within given times
    Name                                                           # reqs    
50%    66%    75%    80%    90%    95%    98%    99%   100%
   
--------------------------------------------------------------------------------------------------------------------------------------------
    POST /predictions/resnet-18                                       141     
38     39     39     40     40     42     49     49    340
   
--------------------------------------------------------------------------------------------------------------------------------------------
    Total                                                             141     
38     39     39     40     40     42     49     49    340
   ```
   
   This regression thus carries over to 1.3.1
   
   There is a 30% increase in latency/inference time for resnet-18 based on the 
above results.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0

Reply via email to