[GitHub] [incubator-mxnet] access2rohit opened a new issue #18258: Opperf fails when running for all operators (both CPU and GPU ctx)

GitBox Thu, 07 May 2020 10:33:25 -0700


access2rohit opened a new issue #18258:
URL: https://github.com/apache/incubator-mxnet/issues/18258



   ## Description
   I ran the following command to obtain individual operator performance w/ and 
w/o Large Tensor Support on both CPU and GPU
   
   ```
   #CPU
   python incubator-mxnet/benchmark/opperf/opperf.py --output-format json 
--output-file mxnet_operator_benchmark_results.json
   
   #GPU
   python incubator-mxnet/benchmark/opperf/opperf.py --ctx gpu --output-format 
json --output-file mxnet_operator_benchmark_results.json
   ```
   
   ### Error Message
   In both contexts(CPU and GPU) I get the following error log:
   
   ```
   INFO:root:Begin Benchmark - BatchNorm
   INFO:root:Complete Benchmark - BatchNorm
   INFO:root:Begin Benchmark - Correlation
   INFO:root:Complete Benchmark - Correlation
   INFO:root:Begin Benchmark - Dropout
   INFO:root:Complete Benchmark - Dropout
   INFO:root:Begin Benchmark - Embedding
   INFO:root:Complete Benchmark - Embedding
   INFO:root:Begin Benchmark - FullyConnected
   INFO:root:Complete Benchmark - FullyConnected
   Traceback (most recent call last):
     File "benchmark/opperf/opperf.py", line 227, in <module>
       sys.exit(main())
     File "benchmark/opperf/opperf.py", line 207, in main
       benchmark_results = run_all_mxnet_operator_benchmarks(ctx=ctx, 
dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, 
runs=runs)
     File "benchmark/opperf/opperf.py", line 113, in 
run_all_mxnet_operator_benchmarks
       
mxnet_operator_benchmark_results.append(run_nn_basic_operators_benchmarks(ctx=ctx,
 dtype=dtype, profiler=profiler, int64_tensor=int64_tensor, warmup=warmup, 
runs=runs))
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/nd_operations/nn_basic_operators.py",
 line 143, in run_nn_basic_operators_benchmarks
       mx_nn_basic_op_results = run_op_benchmarks(mx_nn_basic_ops, dtype, ctx, 
profiler, int64_tensor, warmup, runs)
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py",
 line 210, in run_op_benchmarks
       warmup=warmup, runs=runs)
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py",
 line 177, in run_performance_test
       benchmark_result = _run_nd_operator_performance_test(op, inputs, 
run_backward, warmup, runs, kwargs_list, profiler)
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/utils/benchmark_utils.py",
 line 114, in _run_nd_operator_performance_test
       _, _ = benchmark_helper_func(op, warmup, **kwargs_list[0])
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/utils/profiler_utils.py",
 line 200, in cpp_profile_it
       res = func(*args, **kwargs)
     File 
"/home/ubuntu/workspace/incubator-mxnet/benchmark/opperf/utils/ndarray_utils.py",
 line 60, in nd_forward_backward_and_profile
       nd.waitall()
     File 
"/home/ubuntu/workspace/incubator-mxnet/python/mxnet/ndarray/ndarray.py", line 
211, in waitall
       check_call(_LIB.MXNDArrayWaitAll())
     File "/home/ubuntu/workspace/incubator-mxnet/python/mxnet/base.py", line 
246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "../include/mxnet/././tensor_blob.h", line 198
   ```
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run 
the following command and paste the outputs below:
   ```
   curl --retry 10 -s 
https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | 
python
   
   # paste outputs here
   ```
   
   ```
   ubuntu@ip-172-31-0-156 ~/workspace/incubator-mxnet (master) $ curl --retry 
10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py 
| python
   ----------Python Info----------
   Version      : 3.6.6
   Compiler     : GCC 7.3.0
   Build        : ('default', 'Oct  9 2018 12:34:16')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.3.1
   Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 2.0.0
   Directory    : /home/ubuntu/workspace/incubator-mxnet/python/mxnet
   Num GPUs     : 8
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Linux-5.3.0-1017-aws-x86_64-with-debian-buster-sid
   system       : Linux
   node         : ip-172-31-0-156
   release      : 5.3.0-1017-aws
   version      : #18~18.04.1-Ubuntu SMP Wed Apr 8 15:12:16 UTC 2020
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:        x86_64
   CPU op-mode(s):      32-bit, 64-bit
   Byte Order:          Little Endian
   CPU(s):              64
   On-line CPU(s) list: 0-63
   Thread(s) per core:  2
   Core(s) per socket:  16
   Socket(s):           2
   NUMA node(s):        2
   Vendor ID:           GenuineIntel
   CPU family:          6
   Model:               79
   Model name:          Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
   Stepping:            1
   CPU MHz:             2714.244
   CPU max MHz:         3000.0000
   CPU min MHz:         1200.0000
   BogoMIPS:            4600.17
   Hypervisor vendor:   Xen
   Virtualization type: full
   L1d cache:           32K
   L1i cache:           32K
   L2 cache:            256K
   L3 cache:            46080K
   NUMA node0 CPU(s):   0-15,32-47
   NUMA node1 CPU(s):   16-31,48-63
   Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm 
constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf 
pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt 
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 
3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 
erms invpcid rtm rdseed adx xsaveopt ida
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0024 
sec, LOAD: 0.5204 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0009 
sec, LOAD: 0.4424 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0858 sec, LOAD: 
0.0844 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.0102 sec, LOAD: 0.1282 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0307 sec, LOAD: 0.1754 sec.
   Timing for FashionMNIST: 
https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, 
DNS: 0.0756 sec, LOAD: 0.3423 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0031 sec, LOAD: 
0.1128 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: 
Forbidden, DNS finished in 0.0049092769622802734 sec.
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-mxnet] access2rohit opened a new issue #18258: Opperf fails when running for all operators (both CPU and GPU ctx)

Reply via email to