I have a gluon.HybridBlock model trained on multiple GPUs, and my goal is to
copy its parameters to another model on multiple GPUs. The two models share
identical architectures, and we use the same number of GPUs as contexts for
both models. Furthermore, each model has a batch normalization layer.
I tried two ways to initialize the second model with the parameters of the
first one. The first was by using the save_parameters and load_parameters
methods of gluon.Block class as follows:
#model_1 has been initialized on [gpu(i) for i in range(4)]
model_1.save_parameters("params.net")
model_2.load_parameters("params.net", ctx=[gpu(i) for i in range(4)])
Unfortunately, this trick worked for all parameters except for the mean and
variance parameters of the batch normalization layer. Concretely, after
executing the above code, model_1 and model_2 (i) share the same parameters in
all layers other than batch normalization; and (ii) in the batch normalization
layer, they share the same beta and gamma parameters but have different mean
and variance parameters.
The second way was to use the method set_data of gluon.Parameter class as
follows:
> params1 = model_1.collect_params()
> params2 = model_2.collect_params()
> for p1, p2 in zip(params1.values(), params2.values()):
> p1_data = p1.data()
> p2.set_data(p1_data)
Unfortunately, the problem with the above code is that p1.data() gets the data
from one context (i.e., one specific GPU), and subsequently
p2.set_data(p1_data) sets the value of parameter p2 to the same value (i.e.,
p1_data) across all GPUs. However, when training an mxnet model with a batch
normalization layer with multiple GPUs, each GPU (context) has its own mean and
variance batch normalization parameters, while the gamma and beta batch
normalization parameters are shared among all GPUs. (For layers other than
batch normalization, all GPUs share the same parameters.) As a result, the
second approach does not work since it will set the mean and variance for the
batch normalization layer of model_2 to the same value across all GPUs, even
though they should have different values on each GPU in model_1. Interestingly,
the mean and variance parameters of the batch normalization layer is also where
the problem occurred for the first approach outlined above.
What is the reason for this? Is there some other way to address this problem?
---
[Visit
Topic](https://discuss.mxnet.apache.org/t/batchnorm-parameters-are-not-properly-copied-under-multiple-gpu-setting/6676/1)
or reply to this email to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.mxnet.apache.org/email/unsubscribe/39d4cbf509afacdf5f7a1b47f8e811e9ed483382e9d6e4ed60f24e7fd188e915).