huyangc opened a new issue #10788: Start a process for training. The training get stuck URL: https://github.com/apache/incubator-mxnet/issues/10788 I start a process for training procedure, including loading data, prepare a module, and module.fit. However, when I just ran the training procedure, it can be successfully started, but when I start a process of the training procedure, it get stuck with the gpu memory allocated but GPU-Util is also 0%. some pseudocode here ``` def train_net(args): sym = build_net() mod = mx.mod.Module(sym, ctx=gpu(0-7)) initializer=mx.xiavier dataiter = get_dataiter() mod.fit(...) def train(): p = mp.Process(target=train_net, args=(args,)) p.start() p.join() ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services