acphile opened a new issue #18412: URL: https://github.com/apache/incubator-mxnet/issues/18412
## Motivations Currently the implementation of mxnet.gluon.block is not so pythonic and there are many redundancies ### 1. overlaps between Block._params and Block._reg_params when we want to self-define a model, we currently need to use the code as follows: ``` class Net(nn.HybridBlock): def __init__(self, **kwargs): super(HybridNet, self).__init__(**kwargs) with self.name_scope(): self.hidden1 = nn.Dense(256, activation='relu') self.a=self.params.get('a', shape=(1, )) ``` There are several shortcomings when using this form of registration: a. adding parameter ‘a’ will lead to double recordings in both self._params and self._reg_params, which is a redundancy. And there is also a discrepancy in Block: i. In the method “collect_params”, we use “_params” to get all parameters ii. while in the method “_collect_params_with_prefix” (and methods “load_parameters” accordingly), we use “_reg_params” to get all parameters. b. Currently if we do not use “with self.name_scope():” for children blocks, it will lead to wrong name scopes. For the following example, we actually can not get the parameters of self.hidden1 from the result of collect_params ``` class HybridNet(nn.HybridBlock): def __init__(self, **kwargs): super(HybridNet, self).__init__(**kwargs) self.hidden1 = nn.Dense(256, activation='relu') with self.name_scope(): self.hidden2 = nn.Dense(10, activation='relu') def hybrid_forward(self, F, x): x = self.hidden2(self.hidden1(x)) return x >>> net = HybridNet() >>> net.initialize() >>> print(net.collect_params()) hybridnet0_ ( Parameter dense0_weight (shape=(256, -1), dtype=float32) Parameter dense0_bias (shape=(256,), dtype=float32) Parameter hybridnet0_dense0_weight (shape=(10, -1), dtype=float32) Parameter hybridnet0_dense0_bias (shape=(10,), dtype=float32) ) ``` From the above example we can also find that the parameter names are not related to the attributes’ names, which is not straightforward. In all, we find that using name_scope and ParameterDict is not user-friendly. Thus we plan to remove such redundancies and simplify the definitions of children blocks and parameters, like: ``` class Net(nn.HybridBlock): def __init__(self, **kwargs): super(HybridNet, self).__init__(**kwargs) self.hidden1 = nn.Dense(256, activation='relu') self.a=gluon.parameter.Parameter(name="a", shape=(1, )) ``` And we also can get correct results in the following operations like “collect_params”: For the above example, when calling “collect_params” we should get the form like: { “a”: xxx “hidden1_weight”: xxx “hidden1_bias”: xxx } which is like the origianl form used in _collect_params_with_prefix() For using name_scope in hybridization, we use new method “set_prefix()” to recursively add prefix for all parameters inside, using the name of attributes (similar form as the keys in _collect_params_with_prefix()) For example: ``` net=Net() >>> print(net.collect_params()) { “a”: Parameter a (...) “hidden1_weight”: Parameter weight (...) “hidden1_bias”: Parameter bias (...) } >>> net.set_prefix() >>> print(net.collect_params()) { “a”: Parameter a (...) “hidden1_weight”: Parameter hidden1_weight (...) “hidden1_bias”: Parameter hidden1_bias (...) } ``` Shared parameter would be added with the prefix where it first occurs. ### 2. parameter sharing Currently, we use parameter “params” in the definition of Block for parameter sharing. It means before the __init__ of Block, shared parameters already recorded in self._params.shared. And currently Block forbids overriding parameters. We think that this is not convenient. A most common way to share parameter is like what Pytorch does, like ``` self.hidden1.weight=self.hidden2.weight ``` But note that in the case where we have a HybridBlock and the block has been hybridized, then we shouldn't allow overriding the parameter but ask the user to unhybridize the Block first. To further allow sharing parameters recursively, we plan to add an API: ``` def share_parameters(self, params : Dict): ``` We plan to use the structured based form (like what is used in “_collect_params_with_prefix()”) to represent each parameter recursively. For example, we denote “self.hidden1.weight” as “hidden_weight” In all, we plan to make the following improvements: 1. remove parameters “prefix” and “params” in the “\_\_init\_\_" function. 2. remove the use of self._params(ParameterDict) in Block 3. allow parameter attribute overriding in non-hydridization case. 4. add the method “share_parameters" to recursively share parameters in children blocks. ## Detailed improvements ### For Class Block 1. remove attribute _empty_prefix, _prefix, _params, _profiler_scope_name, _name and their corresponding properties. ``` @property def params(self): """Returns this :py:class:`Block`'s parameter dictionary (does not include its children's parameters).""" return self._reg_params ``` 2. use the structured format for self._scope a. self._scope is only intended to be used in hybridize() internally 3. implement collect_params with _collect_params_with_prefix a. def _collect_params_with_prefix(self, prefix='', select=None): 4. change the implementation of save_params to save_parameters 5. move the implementation of ParameterDict.initialize to Block.initialize 6. move ParameterDict.zero_grad() to Block.zero_grad() a. call model.zero_grad() instead of model.collect_params().zero_grad() 7. move ParameterDict.reset_ctx(ctx) to Block.reset_ctx(ctx) 8. move ParameterDict.setattr to Block.setattr 9. add method share_parameters(self, params : OrderedDict): a. return self 10. add method set_prefix(): a. return self b. when calling Block.initialize(), set_prefix() would be called first internally. ### For Class HybridBlock: 1. before hybridize() is called, set_prefix would be called first. ``` net = Net() net.initialize() net.hybridize() ``` or ``` net = Net().set_prefix() net.hybridize() ``` ### For Parameter: 1. add attribute self._prefix (which is set automatically by Block.set_prefix()) 2. use self._prefix + self._name for attribute self.name ### For other children class: 1. remove params and prefix 2. use Parameter instead of self.params.get like ``` #self.i2h_weight = self.params.get('i2h_weight', shape=(hidden_size, input_size), # init=i2h_weight_initializer, # allow_deferred_init=True) self.i2h_weight = Parameter(shape=(hidden_size, input_size), init=i2h_weight_initializer, allow_deferred_init=True) ``` ### For _RNNLayer (gluon.rnn.rnn_layer._RNNlayer) 1. remove the implementation of _unfuse() 2. remove the implementation of _collect_params_with_prefix() (Seemingly It is only for backward compatibility) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org