rahul003 commented on issue #9152: tutorial for distributed training
URL: https://github.com/apache/incubator-mxnet/pull/9152#issuecomment-371059065
 
 
   @TaoLv Sorry I missed your comment. You can profile the worker process 
similar to a single machine case. 
   ```
   mx.profiler.profiler_set_config(mode='all', filename= str(kv.rank) + 
'profile_output.json')
   mx.profiler.profiler_set_state('run')
       # Code to be profiled goes here...
   mx.profiler.profiler_set_state('stop')
   ```
   Note the use of rank above to ensure that the path to save profile should be 
different for different workers. 
   
   There you can look for the operators KVStore Push/Pull to see time taken for 
communication.
   
   I'll add a proper section for profiling to the tutorial once 
https://github.com/apache/incubator-mxnet/pull/9933 is merged. That makes it 
easy to profile the server processes too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to