[GitHub] [incubator-mxnet] matteosal commented on issue #20675: MXNet 2.0 up to 10x slower than 1.x on Windows

GitBox Thu, 16 Dec 2021 08:19:15 -0800


matteosal commented on issue #20675:
URL: 
https://github.com/apache/incubator-mxnet/issues/20675#issuecomment-995968726



   @RafLit while trying to build with Visual Studio OpenMP we discovered that 
building with VC2019 instead of VC2017 makes our build as fast as yours. This 
seems to be expected because `/openmp:experimental` is only supported on VC2019 
(https://docs.microsoft.com/en-us/cpp/build/reference/openmp-enable-openmp-2-0-support?view=msvc-170).
 But there is no failure for using  `/openmp:experimental` with VC2017 so we 
never realized it. What VC version have you used to build? Should the CMake 
script fail explicitly for VC2017 and below?
   
   Also, using VC2019 makes our build as fast as yours, but still not as fast 
as mxnet 1.6 for some models. The worst slowdown went from about 10x with 
VC2017 to about 3x with VC2019, and it's for this case (symbol attached):
   
   [net.zip](https://github.com/apache/incubator-mxnet/files/7728816/net.zip)
   ```
   import mxnet as mx
   import time
   
   print(mx.__version__)
   
   batch_size = 64
   n_iter = 20
   
   sym = mx.symbol.load('net.json') 
   arg_shapes = sym.infer_shape(**{'.Inputs.Input': (batch_size, 1, 64, 64)})[0]
   
   def gen_arrays(shapes):
        return [mx.nd.ones(shape) for shape in shapes]
   
   if(mx.__version__ == '2.0.0'):
        ex = sym._bind(mx.cpu(), gen_arrays(arg_shapes), 
args_grad=gen_arrays(arg_shapes))
   else:
        ex = sym.bind(mx.cpu(), gen_arrays(arg_shapes), 
args_grad=gen_arrays(arg_shapes))
   
   input_data = [mx.ndarray.random.uniform(0, 1, [batch_size, 1, 64, 64]) for i 
in range(n_iter)]
   outgrad = [mx.nd.ones([batch_size, 1])]
   
   start = time.time()
   for i in range(n_iter):
        ex.arg_dict['.Inputs.Input'] = input_data[i]
        ex.forward(is_train=True)
        ex.backward(outgrad)
        mx.ndarray.waitall()
   end = time.time()
   
   inputs_per_sec = batch_size * n_iter / (end - start)
   
   print(inputs_per_sec)
   ```
   
   My results are:
   MXNet 2.0 + VC2017: 12 inputs/s
   MXNet 2.0 + VC2019 (and your build): 55 inputs/s
   MXNet 1.6: 140 inputs/s
   
   Do you reproduce a similar performance loss (55/140)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-mxnet] matteosal commented on issue #20675: MXNet 2.0 up to 10x slower than 1.x on Windows

Reply via email to