momo1986 commented on issue #14672: Is it possible for mxnet to save the best 
model and early stopping
URL: 
https://github.com/apache/incubator-mxnet/issues/14672#issuecomment-482406117
 
 
   > @momo1986 Glad that you are experimenting with the framework, I assume you 
are using a for loop setup for the epochs?could you share with us a snippet of 
code, on what you are trying to achieve? I guess that will help us answer this 
question better.
   > 
   > @mxnet-label-bot add [Question]
   
   Hello, @vrakesh .
   
   Here is my code.
   
   ```
   
#############################################################################################
   # Try use GPU for training
   try:
       a = mx.nd.zeros((1,), ctx=mx.gpu(0))
       ctx = [mx.gpu(0)]
   except:
       ctx = [mx.cpu()]
   
   
#############################################################################################
   # Start training(finetuning)
   net.collect_params().reset_ctx(ctx)
   trainer = gluon.Trainer(
       net.collect_params(), 'sgd',
       {'learning_rate': 0.001, 'wd': 0.0005, 'momentum': 0.9})
   
   mbox_loss = gcv.loss.SSDMultiBoxLoss()
   ce_metric = mx.metric.Loss('CrossEntropy')
   smoothl1_metric = mx.metric.Loss('SmoothL1')
   
   for epoch in range(0, 100):
       ce_metric.reset()
       smoothl1_metric.reset()
       tic = time.time()
       btic = time.time()
       net.hybridize(static_alloc=True, static_shape=True)
       for i, batch in enumerate(train_data):
           batch_size = batch[0].shape[0]
           data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, 
batch_axis=0)
           cls_targets = gluon.utils.split_and_load(batch[1], ctx_list=ctx, 
batch_axis=0)
           box_targets = gluon.utils.split_and_load(batch[2], ctx_list=ctx, 
batch_axis=0)
           with autograd.record():
               cls_preds = []
               box_preds = []
               for x in data:
                   cls_pred, box_pred, _ = net(x)
                   cls_preds.append(cls_pred)
                   box_preds.append(box_pred)
               sum_loss, cls_loss, box_loss = mbox_loss(
                   cls_preds, box_preds, cls_targets, box_targets)
               autograd.backward(sum_loss)
           # since we have already normalized the loss, we don't want to 
normalize
           # by batch-size anymore
           trainer.step(1)
           ce_metric.update(0, [l * batch_size for l in cls_loss])
           smoothl1_metric.update(0, [l * batch_size for l in box_loss])
           name1, loss1 = ce_metric.get()
           name2, loss2 = smoothl1_metric.get()
           if i % 20 == 0:
               print('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, 
{}={:.3f}, {}={:.3f}'.format(
                   epoch, i, batch_size/(time.time()-btic), name1, loss1, 
name2, loss2))
           btic = time.time()
   
   
#############################################################################################
   # Save finetuned weights to disk
   net.save_parameters('ssd_512_mobilenet1.0_right_hand.params')
   ```
   
   It does not change a lot from official gluon code.
   
   In the MXNET/Gluon forum in Chinese, there is a discussion with some demos 
to save the best check-point.
   
   However, I am very curious about a better solution to save the best 
parameter.
   
   Thanks for your answer.
   
   Momo

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to