[ 
https://issues.apache.org/jira/browse/SINGA-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangwei updated SINGA-107:
--------------------------
    Description: 
When Params are loaded from checkpoint files, their version numbers will be 
reset to 0 for fine-tuning as explained in the comments of SINGA-42.
However, if these parameters are not fine-tuned (For example, in 
https://github.com/apache/incubator-singa/tree/master/examples/rbm, in RBM2, 
the parameters from RBM1 are not updated), then these parameters' versions 
would be 0 when they are dumped into the checkpoint files. When these 
parameters are loaded again for training other models, their versions are 0, 
hence they should be initialized again according to SINGA-42. In other words, 
the pre-training is useless.

Currently solution is loading the checkpoint file where each Param is first 
dumped, so that the latter (correct) Param can override the in-correct Param. 
Consequently, the version number will not be 0.

For example, in 
https://github.com/apache/incubator-singa/tree/master/examples/rbm/rbm3.conf , 
we configure the checkpoint files as:

checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"

in order to load w1 and b12 correctly.

  was:
When params are loaded from checkpoint files, their version numbers will be 
reset to 0 for fine-tuning as explained in the comments of SINGA-42.
Then if this param is used again in another model, then the version number will 
become 0 and this param is not regarded as pre-training param thus will require 
initialization, which will incur problems.

Present solution is to load this param more than one time, so that the latter 
loading can override the first loading, and the version number will not be 0 
and this param is still regarded as pre-training param.

For example, in rbm3.conf, we write like:
checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
in order to load w1 and b12 twice.


> Error from loading pre-trained params for training stacked RBMs
> ---------------------------------------------------------------
>
>                 Key: SINGA-107
>                 URL: https://issues.apache.org/jira/browse/SINGA-107
>             Project: Singa
>          Issue Type: Bug
>            Reporter: ZHAOJING
>
> When Params are loaded from checkpoint files, their version numbers will be 
> reset to 0 for fine-tuning as explained in the comments of SINGA-42.
> However, if these parameters are not fine-tuned (For example, in 
> https://github.com/apache/incubator-singa/tree/master/examples/rbm, in RBM2, 
> the parameters from RBM1 are not updated), then these parameters' versions 
> would be 0 when they are dumped into the checkpoint files. When these 
> parameters are loaded again for training other models, their versions are 0, 
> hence they should be initialized again according to SINGA-42. In other words, 
> the pre-training is useless.
> Currently solution is loading the checkpoint file where each Param is first 
> dumped, so that the latter (correct) Param can override the in-correct Param. 
> Consequently, the version number will not be 0.
> For example, in 
> https://github.com/apache/incubator-singa/tree/master/examples/rbm/rbm3.conf 
> , we configure the checkpoint files as:
> checkpoint_path: "examples/rbm/rbm2/checkpoint/step6000-worker0"
> checkpoint_path: "examples/rbm/rbm1/checkpoint/step6000-worker0"
> in order to load w1 and b12 correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to