Thanks for the details.
As mentioned, this issue of slow GPU initialization and large CPU memory footprint both seem to be to specific to this model... So as a result, it would be tough to make more claims before getting details about the implementation. I'd defer to someone else experienced in model building / GPU on the discuss forum to help out. --- [Visit Topic](https://discuss.mxnet.io/t/very-slow-initialisation-of-gpu-distributed-training/6357/4) or reply to this email to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.mxnet.io/email/unsubscribe/d45f70266218a4b8e57dc0face37ceb6b7a89c0ad9c7d5eb98bfc7f5df6de058).
