apeforest opened a new issue #14569: performance degradation in model inference from 1.3.1 to 1.4.0 URL: https://github.com/apache/incubator-mxnet/issues/14569 There seems to be a regression on resnet-18 model inference time (When running on GPU) post this PR, this was caught in MMS. nightly runs, the changes in this PR seem to be causing this issue. **Setup** We use MMS docker images to run load tests, we can start a local container using the following command. ```bash nvidia-docker run --name mms_benchmark_gpu -p 8080:8080 -p 8081:8081 -itd awsdeeplearningteam/mxnet-model-server:nightly-mxnet-gpu ``` for building MXNet opencv 3.2 and CUDA 9.2 were used. Load testing was done using locust, to install locust ```bash pip install locust ``` # Download Test image ```bash curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg ``` The locust script for load testing ```python # test_resnet_!8.py from locust import HttpLocust, TaskSet, task, TaskSequence, seq_task import urllib import os data = None with open(os.path.join(os.getcwd(),'kitten.jpg'), 'rb') as data: data = data.read() class PredictionTasks(TaskSet): @task def inference(self): self.client.post("/predictions/resnet-18", data=data,headers={'Content-Type': 'image/jpeg'}) class Prediction(HttpLocust): task_set = PredictionTasks min_wait = 100 max_wait = 100 ``` **Running Load test** Registering and loading model ```bash # Register and load resnet-18 model archive curl -X POST 127.0.0.1:8081/models?url=https://s3.amazonaws.com/model-server/model_archive_1.0/resnet-18.mar ``` Start a single worker and run latency test ```bash Start worker and latency test $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true' $ locust -f test_resnet_18.py Prediction --host=http://127.0.0.1:8080 --no-web -c 1 -r 1 -t 20s --only-summary ``` **To change mxnet version/build in docker image,** **NOTE** By default recent pip version is pulled. ```bash # Go into docker image nvidia-docker exec -u root -it mms_benchmark_gpu bash $ pip uninstall mxnet-cu92mkl $ pip install <new-build>.whl ctrl + p + q to quit docker image # Destroy existing worker, and create new worker, this loads in newly installed mxnet $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=0&synchronous=true' $ curl -X PUT 'http://127.0.0.1:8081/models/resnet-18?min_worker=1&synchronous=true' ``` **Results** on mxnet-cu92==1.3.0post0 ```bash # locust result Name # reqs # fails Avg Min Max | Median req/s -------------------------------------------------------------------------------------------------------------------------------------------- POST /predictions/resnet-18 152 0(0.00%) 31 30 39 | 31 7.60 -------------------------------------------------------------------------------------------------------------------------------------------- Total 152 0(0.00%) 7.60 Percentage of the requests completed within given times Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100% -------------------------------------------------------------------------------------------------------------------------------------------- POST /predictions/resnet-18 152 31 31 31 31 32 33 33 34 280 -------------------------------------------------------------------------------------------------------------------------------------------- Total 152 31 31 31 31 32 33 33 34 280 ``` On mxnet-cu92 with commit https://github.com/apache/incubator-mxnet/commit/f9f74169bb05f85d85dec5991aa5fc9050dec9f6 ```bash Name # reqs # fails Avg Min Max | Median req/s -------------------------------------------------------------------------------------------------------------------------------------------- POST /predictions/resnet-18 141 0(0.00%) 41 37 337 | 38 7.20 -------------------------------------------------------------------------------------------------------------------------------------------- Total 141 0(0.00%) 7.20 Percentage of the requests completed within given times Name # reqs 50% 66% 75% 80% 90% 95% 98% 99% 100% -------------------------------------------------------------------------------------------------------------------------------------------- POST /predictions/resnet-18 141 38 39 39 40 40 42 49 49 340 -------------------------------------------------------------------------------------------------------------------------------------------- Total 141 38 39 39 40 40 42 49 49 340 ``` This regression thus carries over to 1.3.1 There is a 30% increase in latency/inference time for resnet-18 based on the above results.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services