moseswmwong opened a new issue #19242:
URL: https://github.com/apache/incubator-mxnet/issues/19242


   ## Description
   
   Installation stem from this link
   - https://mxnet.apache.org/versions/1.7/get_started/windows_setup 
   
   Mxnet runs very slow on my Windows 10 machine. It is 64 bits system:
   - CPU - Intel i7, gaming grade PC
   - RAM - 16 G 
   - GPU - nVidia GTX 1080 8G RAM 
   - Harddisc - 4 TB
   
   Installations:
   Strictly followed all instruction possible with careful check every 
milestone.
   - Anaconda 3, added channel "conda-forge", created environment Python 3.6
   - Intel MKL software installed - 2020 Update 3
   - Microsoft Visual Studio 2017
   - nVidia CUDA 10.2 installed, compiled sample code passed nvidia querydevice 
and fluidsimulator test. (I know the instruction recommend 9.2 but I have this 
installed, 9.2 is bit old and CUDA is download compatible)
   - nVidia cuDNN for CUDA 10.2 installed, all bin, include, lib installed to 
CUDA 10.2 directory tree
   
   Installation inside anaconda environment.
   - As a new conda environment is created here are the modules
   - python 3.6
   - opencv 4.4.0 from conda-forge
   - pip install mxnet-cu102mkl failed from issue list found that it is too 
large so need to do this from your repo - pip install mxnet-cu102mkl==1.7.0 -f 
https://dist.mxnet.io/python (Issue 
[link](https://github.com/apache/incubator-mxnet/issues/17963))
   - pip install gluoncv
        
   Checks all passed:
   
   check1
   Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import cv2
   >>> print (cv2.__version__)
   4.4.0
   
   check2
   Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet
   >>> print (mxnet.__version__)
   1.7.0 
   
   
   check3
   Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet
   >>> import gluoncv
   >>> print (gluoncv.__version__)
   0.8.0
   
   
   check4
   Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet as mx
   >>> a = mx.nd.ones((2,3), mx.gpu())
   >>> b = a * 2 + 1
   >>> b.asnumpy()
   array([[3., 3., 3.],
          [3., 3., 3.]], dtype=float32)
   
   
   check5
   Python 3.6.11 (default, Aug  5 2020, 19:41:03) [MSC v.1916 64 bit (AMD64)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> import mxnet
   >>> from mxnet.runtime import feature_list
   >>> feature_list()
   [✔ CUDA, ✔ CUDNN, ✖ NCCL, ✔ CUDA_RTC, ✖ TENSORRT, ✖ CPU_SSE, ✖ CPU_SSE2, ✖ 
CPU_SSE3, ✖ CPU_SSE4_1, ✖ CPU_SSE4_2, ✖ CPU_SSE4A, ✖ CPU_AVX, ✖ CPU_AVX2, ✔ 
OPENMP, ✖ SSE, ✖ F16C, ✖ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ 
BLAS_APPLE, ✔ LAPACK, ✔ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, 
✖ CXX14, ✖ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]
   
   
   ### Error Message
   
   No error message
   
   Problem is that the following code runs about 0.5 frame per second on a 480p 
youtube MP4 video.
   
   `import time
   import gluoncv as gcv
   from gluoncv.utils import try_import_cv2
   cv2 = try_import_cv2()
   import mxnet as mx
   
   def gpu_device(gpu_number=0):
       try:
           _ = mx.nd.array([1, 2, 3], ctx=mx.gpu(gpu_number))
       except mx.MXNetError:
           return False
       _ = mx.gpu(gpu_number)
       return True
   if not gpu_device():
       print('No GPU device found!')
       exit() 
   
   net = gcv.model_zoo.get_model('ssd_512_mobilenet1.0_voc', pretrained=True)
   # Compile the model for faster speed
   net.hybridize()
   
   cap = cv2.VideoCapture('video.mp4')
   time.sleep(1)
   
   axes = None
   #NUM_FRAMES = 1000  
   #for i in range(NUM_FRAMES):
   while True:
       # Load frame from the camera
       ret, frame = cap.read()
       if not ret: 
           break
   
       # Image pre-processing
       frame = mx.nd.array(cv2.cvtColor(frame, 
cv2.COLOR_BGR2RGB)).astype('uint8')
       rgb_nd, frame = gcv.data.transforms.presets.ssd.transform_test(frame, 
short=512, max_size=700)
   
       # Run frame through network
       class_IDs, scores, bounding_boxes = net(rgb_nd)
   
       # Display the result
       img = gcv.utils.viz.cv_plot_bbox(frame, bounding_boxes[0], scores[0], 
class_IDs[0], class_names=net.classes)
       gcv.utils.viz.cv_plot_image(img)
   
       if cv2.waitKey(1) == ord('q'):
           break
       #cv2.waitKey(1)
   
   cap.release()
   cv2.destroyAllWindows()
   
   `
   
   YOLOv3 should be able to do more than 30 fps.
   
   Look at Windows task manager report 
   
   `
                      22%         45%          1%
   Name         CPU   Memory    ... GPU   GPU engine
   Python       16%      278 MB        9%
   `
   Obviously GPU is not working for MXnet, but as check5 shown CUDA, cuDNN 
features are all availagble to Mxnet, and the check 4 ">>> a = 
mx.nd.ones((2,3), mx.gpu())" proved it is using gpu.
   
   
   
   
   ## To Reproduce
   please see above
   
   ### Steps to reproduce
   please see above
   
   ## What have you tried to solve it?
   please see check 1-5
   
   ## Environment
   please see above
   
   # paste outputs here
   Nice Yolo video with bounding boxes on each person, car, bicycle etc. but 
run at about 0.5 fps
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to