edsn60 opened a new issue #19635:
URL: https://github.com/apache/incubator-mxnet/issues/19635


   ## Description
   I was trying to train maskrcnn using mxnet and gluoncv under **cpu** with 
the script "train_mask_rcnn.py" provided by gluoncv (see 
https://cv.gluon.ai/build/examples_instance/train_mask_rcnn_coco.html). The 
train script does not raise any error, however, when I tried to load my 
pretrained model and test an image, I got this error. I don't know what's 
happening.
   
   ### Error Message
   Traceback (most recent call last):
     File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 88, 
in <module>
       ids, scores, bboxes, masks = [xx[0].asnumpy() for xx in net(x)]
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py",
 line 683, in __call__
       out = self.forward(*args)
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py",
 line 1430, in forward
       return self._call_cached_op(x, *args)
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py",
 line 1022, in _call_cached_op
       **raise ValueError("The argument structure of HybridBlock does not match"
   ValueError: The argument structure of HybridBlock does not match the cached 
version. Stored format = [0], input format = [0, 0, 0]**
   
   ## To Reproduce
   This is "pre_mask_rcnn.py": 
   from matplotlib import pyplot as plt
   from gluoncv import model_zoo, data, utils
   from mxnet import gluon
   import mxnet as mx
   
   net = 
gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json',
 ['data0', 'data1', 'data2'], 
'./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu()) # this 
is where the error happens
   
   in train_mask_rcnn.py, I used the following two lines to save the model and 
parameters.
   net.save_parameters('{:s}_{:04d}_{:.4f}.params'.format(prefix, epoch, 
current_map))
   net.export('{:s}_{:04d}_{:.4f}'.format(prefix, epoch, current_map), epoch=0)
   
   ### Steps to reproduce
   
   Run the "pre_mask_rcnn.py" in pycharm
   
   ## What have you tried to solve it?
   
     ## Load from model_zoo
   I tried to use another way to load model from model_zoo with my pretrained 
parameters:
   
   param = "./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params"
   net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=False)
   net.initialize(ctx=mx.cpu())
   net.reset_class(['insulator'])
   net.load_parameters(param.strip())
   
   **but I got another error in "net.load_parameters(param.strip())"**
   Traceback (most recent call last):
     File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 35, 
in <module>
       net.load_parameters(param.strip())
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py",
 line 530, in load_parameters
       self.collect_params().load(
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py",
 line 1022, in load
       self.load_dict(ndarray_load, ctx, allow_missing,
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py",
 line 1055, in load_dict
       assert name in arg_dict, \
   AssertionError: **Parameter 'conv0_weight' is missing in file:** 
./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains 
parameters: 'maskrcnn0_resnetv1b_conv0_weight', 
'maskrcnn0_resnetv1b_batchnorm0_gamma', 'maskrcnn0_resnetv1b_batchnorm0_beta', 
..., 'maskrcnn0_maskrcnn0_mask0_conv0_weight', 
'maskrcnn0_maskrcnn0_mask0_conv0_bias', 
'maskrcnn0_maskrcnn0_mask0_conv1_weight', 
'maskrcnn0_maskrcnn0_mask0_conv1_bias'. Please make sure source and target 
networks have the same prefix.For more info on naming, please see 
https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html
   
   It seems that the name of parameters in the loaded model do not have prefix 
""maskrcnn0_resnetv1b_", which appears in my saved parameters. I went back to 
"train_mask_rcnn.py" and found an argument "save_prefix", but it affects the 
file name of ".param" and ".json" instead of the parameters themselves.
   
   By the way, in "pre_mask_rcnn.py", if I change
   net = 
gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json',
 **['data0', 'data1', 'data2'],** 
'./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu()) 
   
   into
   net = 
gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json',
 **['data0'],** './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', 
ctx=mx.cpu()) 
   
   then I got another error:
   Traceback (most recent call last):
     File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 37, 
in <module>
       net = 
gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json',
 ['data0'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', 
ctx=mx.cpu())
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py",
 line 1366, in imports
       ret.collect_params().load(param_file, ctx=ctx, cast_dtype=True, 
dtype_source='saved')
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py",
 line 1022, in load
       self.load_dict(ndarray_load, ctx, allow_missing,
     File 
"/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py",
 line 1055, in load_dict
       assert name in arg_dict, \
   AssertionError: **Parameter 'data1' is missing in file**: 
./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains 
parameters: 'resnetv1b_conv0_weight', 'resnetv1b_batchnorm0_gamma', 
'resnetv1b_batchnorm0_beta', ..., 'maskrcnn0_mask0_conv0_weight', 
'maskrcnn0_mask0_conv0_bias', 'maskrcnn0_mask0_conv1_weight', 
'maskrcnn0_mask0_conv1_bias'. Please make sure source and target networks have 
the same prefix.For more info on naming, please see 
https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html
   
   ## Environment
   
   ----------Python Info----------
   Version      : 3.8.5
   Compiler     : GCC 7.3.0
   Build        : ('default', 'Sep  4 2020 07:30:14')
   Arch         : ('64bit', 'ELF')
   ------------Pip Info-----------
   Version      : 20.2.4
   Directory    : 
/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.7.0
   Directory    : 
/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet
   Commit Hash   : 64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   64f737cdd59fe88d2c5b479f25d011c5156b6a8a
   Library      : 
['/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/libmxnet.so']
   Build features:
   ✖ CUDA
   ✖ CUDNN
   ✖ NCCL
   ✖ CUDA_RTC
   ✖ TENSORRT
   ✔ CPU_SSE
   ✔ CPU_SSE2
   ✔ CPU_SSE3
   ✔ CPU_SSE4_1
   ✔ CPU_SSE4_2
   ✖ CPU_SSE4A
   ✔ CPU_AVX
   ✖ CPU_AVX2
   ✔ OPENMP
   ✖ SSE
   ✔ F16C
   ✖ JEMALLOC
   ✔ BLAS_OPEN
   ✖ BLAS_ATLAS
   ✖ BLAS_MKL
   ✖ BLAS_APPLE
   ✔ LAPACK
   ✔ MKLDNN
   ✔ OPENCV
   ✖ CAFFE
   ✖ PROFILER
   ✔ DIST_KVSTORE
   ✖ CXX14
   ✖ INT64_TENSOR_SIZE
   ✔ SIGNAL_HANDLER
   ✖ DEBUG
   ✖ TVM_OP
   ----------System Info----------
   Platform     : Linux-5.4.0-56-generic-x86_64-with-glibc2.10
   system       : Linux
   node         : s-ubuntu
   release      : 5.4.0-56-generic
   version      : #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:        x86_64
   CPU op-mode(s):      32-bit, 64-bit
   Byte Order:          Little Endian
   CPU(s):              8
   On-line CPU(s) list: 0-7
   Thread(s) per core:  1
   Core(s) per socket:  8
   Socket(s):           1
   NUMA node(s):        1
   Vendor ID:           GenuineIntel
   CPU family:          6
   Model:               158
   Model name:          Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
   Stepping:            13
   CPU MHz:             906.093
   CPU max MHz:         4700.0000
   CPU min MHz:         800.0000
   BogoMIPS:            6000.00
   Virtualization:      VT-x
   L1d cache:           32K
   L1i cache:           32K
   L2 cache:            256K
   L3 cache:            12288K
   NUMA node0 CPU(s):   0-7
   Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx 
smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch 
cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow 
vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms 
invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves 
dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear 
flush_l1d arch_capabilities
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0181 
sec, LOAD: 16.4462 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.8030 sec, LOAD: 
3.1562 sec.
   Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: 
CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired 
(_ssl.c:1123)>, DNS finished in 0.39185047149658203 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.8720 sec, LOAD: 2.1558 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0062 sec, LOAD: 
6.1062 sec.
   Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: 
Forbidden, DNS finished in 0.5848102569580078 sec.
   ----------Environment----------


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to