[GitHub] [incubator-mxnet] khaotik opened a new issue #20702: GPU _FusedOp causes "control dep for op not found in graph"

GitBox Mon, 25 Oct 2021 08:59:00 -0700


khaotik opened a new issue #20702:
URL: https://github.com/apache/incubator-mxnet/issues/20702



   ## Description
   
   I think this is a variant of #8029 and #16736, somehow related to `_FusedOp` 
on GPU.
   
   ### Error Message
   ```text
   [23:52:58] 
/home/khaotik/WKSP/dev/incubator-mxnet/src/storage/storage.cc:202: Using Pooled 
(Naive) StorageManager for GPU
   [23:52:58] 
/home/khaotik/WKSP/dev/incubator-mxnet/src/storage/storage.cc:202: Using Pooled 
(Naive) StorageManager for CPU
   Traceback (most recent call last):
     File "test.py", line 91, in <module>
       v_y = fn.forward(v_x)
     File 
"/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/gluon/block.py", line 
1821, in forward
       return self._call_cached_op(x, *args)
     File 
"/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/gluon/block.py", line 
1267, in _call_cached_op
       out = self._cached_op(*cargs)
     File 
"/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/_ctypes/cached_op.py", 
line 126, in __call__
       check_call(_LIB.MXInvokeCachedOp(
     File "/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/base.py", line 
246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     [bt] (6) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(MXInvokeCachedOp+0x21b)
 [0x7fa4638b05bb]
     [bt] (5) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::CachedOp::Forward(std::shared_ptr<mxnet::CachedOp>
 const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, 
mxnet::Context const&)+0x356) [0x7fa463a05f86]
     [bt] (4) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::CachedOp::GetCachedOpState(mxnet::Context
 const&)+0x142) [0x7fa4639fbca2]
     [bt] (3) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(mxnet::CachedOp::CachedOpState::CachedOpState(mxnet::Context
 const&, nnvm::Graph const&, nnvm::Graph const&, bool)+0x1220) [0x7fa463a1f590]
     [bt] (2) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::Graph::indexed_graph()
 const+0x3b) [0x7fa46fb33c8b]
     [bt] (1) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(nnvm::IndexedGraph::IndexedGraph(nnvm::Graph
 const&)+0x502) [0x7fa46fb31b72]
     [bt] (0) 
/media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet/../../build/libmxnet.so(+0xe485022)
 [0x7fa46fb31022]
     File 
"/home/khaotik/WKSP/dev/incubator-mxnet/3rdparty/tvm/nnvm/src/core/graph.cc", 
line 107
   MXNetError: Check failed: it != node2index_.end(): control dep not found in 
graph
   ```
   
   ## To Reproduce
   Here's a simplified version of my original model that raised this error.
   ```python
   import mxnet as mx
   import mxnet.symbol as ms
   
   def _tensorAt(s_inp, i:int):
       '''symbolic s[i]'''
       return ms.squeeze(ms.slice(s_inp, i, i+1), axis=0)
   
   def _splineInterpolation(t, cv_list, degree=3):
       '''
       Simplified NURBS interpolation
       Returns:
           a weighted linear combination from a slice of `cv_list` up to 
`degree+1` terms
           weights are positive and sums to 1.0
   
       Args:
           t: float
               t <= -1 -> gives the first element of cv_list
               t >= len(cv_list) -> gives the last element of cv_list
           cv_list: list
               list of items that can be added or multiplied by a float number
               both symbolic and numeric value would work
           degree: int
               must be positive
   
       Example:
       >>> _splineInterpolation(t=-1., cv_list=[a,b,c,d,e,f])
       a
       >>> _splineInterpolation(t=-0.4, cv_list=[a,b,c,d,e,f])
       0.964*a + 0.036*b
       >>> _splineInterpolation(t=2.6, cv_list=[a,b,c,d,e,f])
       0.010666*b + 0.41666*c + 0.538666*d + 0.036*e
       '''
       from math import floor
       assert(isinstance(degree,int) and degree > 0)
       num_cv = len(cv_list)
       assert(num_cv > 0)
       mt = t % 1.
       ct = 1. - mt
       coeff_list = [mt, ct]
       for i in range(1,degree):
           wgt = [(mt+j) / (i+1) for j in range(i+1)]
           new_coeff_list = [0.] * (i+2)
           new_coeff_list[0] = coeff_list[0] * wgt[0]
           new_coeff_list[-1] = coeff_list[-1] * (1.-wgt[-1])
           for j in range(1,i+1):
               new_coeff_list[j] = coeff_list[j]*wgt[j] + 
coeff_list[j-1]*(1.-wgt[j-1])
           coeff_list = new_coeff_list
       coeff_di = dict()
       for i in range(degree+1):
           knot_idx =  min(num_cv-1, max(0, floor(t-(degree>>1)+i)))
           if knot_idx in coeff_di:
               coeff_di[knot_idx] += coeff_list[degree-i]
           else:
               coeff = coeff_list[degree-i]
               if coeff > 0.:
                   coeff_di[knot_idx] = coeff
       # make sure weight sum is 1, normalize numeric error
       scale = 1./sum(coeff_di.values())
       for k in coeff_di:
           coeff_di[k] *= scale
       if len(coeff_di) > 1:
           return sum((cv_list[i]*c if c!=1.0 else cv_list[i]) for i,c in 
enumerate(coeff_di))
       else:
           return cv_list[list(coeff_di.keys())[0]]
   
   # constants
   BATCH_SIZE=2
   NDIM = 16
   CTX = mx.gpu()
   PARAM_DEPTH=8
   RESNET_DEPTH=16
   
   # build/bind symbolic model
   s_x = ms.var('x', shape=(BATCH_SIZE,NDIM,), dtype='float32');
   s_w0 = ms.var('w0', shape=(PARAM_DEPTH,NDIM,NDIM))
   s_w0_li = [_tensorAt(s_w0,i) for i in range(PARAM_DEPTH)]
   s_mid = s_x
   for i in range(RESNET_DEPTH):
       spline_t = (i*(PARAM_DEPTH+1))/(RESNET_DEPTH-1) - 1.
       s_res = s_mid
       s_res = ms.dot(s_res, _splineInterpolation(spline_t, s_w0_li))
       s_res = ms.relu(s_res)
       s_mid = s_mid + s_res
   s_y = s_mid
   fn = mx.gluon.SymbolBlock((s_y,),(s_x,))
   fn.initialize(ctx=CTX)
   
   # run
   scale = NDIM**(-0.5)
   v_x = mx.nd.random_uniform(-scale,scale, shape=(BATCH_SIZE,NDIM), ctx=CTX)
   v_y = fn.forward(v_x)
   ```
   
   ### Steps to reproduce
   Run the above script with current master branch 
(5d247f13fcf5e55b094c5deb90ede1d3a03cc9ac) with GPU.
   
   ## What have you tried to solve it?
   
   - The code works with environment `MXNET_USE_FUSION=0`
   - The code works if using `CTX = mx.cpu()` instead
   - The code works with smaller constant in python code `RESNET_DEPTH=6`
   
   ## Environment
   
   - Single GPU GTX 1080 ti on ubuntu 20.04
   - anaconda python 3.8
   - mxnet is built from source via current master branch 
(5d247f13fcf5e55b094c5deb90ede1d3a03cc9ac)
   
   
   ***We recommend using our script for collecting the diagnostic information 
with the following command***
   `curl --retry 10 -s 
https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
 | python3`
   
   <details>
   <summary>Environment Information</summary>
   
   ```
   ----------Python Info----------
   Version      : 3.8.3
   Compiler     : GCC 7.3.0
   Build        : ('default', 'Jul  2 2020 16:21:59')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 21.0.1
   Directory    : /home/khaotik/anaconda3/lib/python3.8/site-packages/pip
   ----------MXNet Info-----------
   Version      : 2.0.0
   Directory    : /media/LNXDATA/WKSP/dev/incubator-mxnet/python/mxnet
   Num GPUs     : 1
   Hashtag not found. Not installed from pre-built package.
   ----------System Info----------
   Platform     : Linux-5.11.0-40-generic-x86_64-with-glibc2.10
   system       : Linux
   node         : KKST2
   release      : 5.11.0-40-generic
   version      : #44~20.04.1-Ubuntu SMP Wed Oct 20 19:04:34 UTC 2021
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:                    x86_64
   CPU op-mode(s):                  32-bit, 64-bit
   Byte Order:                      Little Endian
   Address sizes:                   39 bits physical, 48 bits virtual
   CPU(s):                          6
   On-line CPU(s) list:             0-5
   Thread(s) per core:              1
   Core(s) per socket:              6
   Socket(s):                       1
   NUMA node(s):                    1
   Vendor ID:                       GenuineIntel
   CPU family:                      6
   Model:                           158
   Model name:                      Intel(R) Core(TM) i5-8600K CPU @ 3.60GHz
   Stepping:                        10
   CPU MHz:                         4116.771
   CPU max MHz:                     4300.0000
   CPU min MHz:                     800.0000
   BogoMIPS:                        7200.00
   Virtualization:                  VT-x
   L1d cache:                       192 KiB
   L1i cache:                       192 KiB
   L2 cache:                        1.5 MiB
   L3 cache:                        9 MiB
   NUMA node0 CPU(s):               0-5
   Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
   Vulnerability L1tf:              Mitigation; PTE Inversion; VMX vulnerable, 
SMT disabled
   Vulnerability Mds:               Vulnerable; SMT disabled
   Vulnerability Meltdown:          Vulnerable
   Vulnerability Spec store bypass: Vulnerable
   Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and 
usercopy barriers only; no swapgs barriers
   Vulnerability Spectre v2:        Vulnerable, IBPB: disabled, STIBP: disabled
   Vulnerability Srbds:             Vulnerable
   Vulnerability Tsx async abort:   Vulnerable
   Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep 
mtrr pge mca cmov pat pse36 clflush dts ac
                                    pi mmx fxsr sse sse2 ss ht tm pbe syscall 
nx pdpe1gb rdtscp lm constant_tsc art arch_p
                                    erfmon pebs bts rep_good nopl xtopology 
nonstop_tsc cpuid aperfmperf pni pclmulqdq dte
                                    s64 monitor ds_cpl vmx smx est tm2 ssse3 
sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2
                                    apic movbe popcnt tsc_deadline_timer aes 
xsave avx f16c rdrand lahf_lm abm 3dnowprefet
                                    ch cpuid_fault invpcid_single ssbd ibrs 
ibpb stibp tpr_shadow vnmi flexpriority ept vp
                                    id ept_ad fsgsbase tsc_adjust bmi1 hle avx2 
smep bmi2 erms invpcid rtm mpx rdseed adx 
                                    smap clflushopt intel_pt xsaveopt xsavec 
xgetbv1 xsaves dtherm ida arat pln pts hwp hw
                                    p_notify hwp_act_window hwp_epp md_clear 
flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0001 
sec, LOAD: 1.0473 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0009 
sec, LOAD: 0.8119 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.0003 sec, LOAD: 
3.1800 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.0010 sec, LOAD: 0.6069 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0002 sec, LOAD: 0.6927 sec.
   Timing for FashionMNIST: 
https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, 
DNS: 0.0003 sec, LOAD: 1.9676 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0009 sec, LOAD: 
1.5478 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: 
Forbidden, DNS finished in 0.0009582042694091797 sec.
   ```
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@mxnet.apache.org
For additional commands, e-mail: issues-h...@mxnet.apache.org

[GitHub] [incubator-mxnet] khaotik opened a new issue #20702: GPU _FusedOp causes "control dep for op not found in graph"

Reply via email to