huyangc opened a new issue #10788: Start a process for training. The training 
get stuck
URL: https://github.com/apache/incubator-mxnet/issues/10788
 
 
   I start a process for training procedure, including loading data, prepare a 
module, and module.fit. However, when I just ran the training procedure, it can 
be successfully started, but when I start a process of the training procedure, 
it get stuck with the gpu memory allocated but GPU-Util is also 0%.
   
   some pseudocode here
   ```
   def train_net(args):
           sym = build_net()
           mod = mx.mod.Module(sym, ctx=gpu(0-7))
           initializer=mx.xiavier
           dataiter = get_dataiter()
           mod.fit(...)
   
   def train():
           p = mp.Process(target=train_net, args=(args,))
           p.start()
           p.join()
   
   ```
           

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to