[ 
https://issues.apache.org/jira/browse/MXNET-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yuan updated MXNET-1112:
----------------------------
    Description: 
  -Error message: *** An error occurred in MPI_Allreduce: the reduction 
operation MPI_SUM is not defined for non-intrinsic datatypes
  -suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not 
support float16 MPI_SUM ops
  -We are running into an issue with `update_multi_precision` in 
https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47
 We don't yet have a way of hooking into SGD's `update_multi_precision` to do 
the `hvd.allreduce` before weight update and after it is casted to float32. The 
way it is written now, the `hvd.allreduce` is all-reducing in `float16`, which 
does not presently support hierarchical allreduce
  -this is an issue, because our scalability experiments for 256 GPUs in 
float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 
92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy holds 
for float16, hierarchical allreduce will be a necessity for getting good 
scalability
  -it is a good idea to rebase with Horovod `master` if you haven't done so 
already, to take advantage of this new performance-improving feature
  -3 possible fixes:
    1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be 
difficult to converge model)
    2) hook into `update_multi_precision` after casting gradient to float32 and 
before weight update
    3) hardcode `hvd.allreduce` here. Problems: Might not be possible for mxnet 
to import horovod?

> float16 HIERARCHICAL_ALLREDUCE not working
> ------------------------------------------
>
>                 Key: MXNET-1112
>                 URL: https://issues.apache.org/jira/browse/MXNET-1112
>             Project: Apache MXNet
>          Issue Type: Improvement
>          Components: Apache MXNet Backend
>            Reporter: Lin Yuan
>            Priority: Major
>
>   -Error message: *** An error occurred in MPI_Allreduce: the reduction 
> operation MPI_SUM is not defined for non-intrinsic datatypes
>   -suspected reason: Horovod MPI using CPU (i.e. non-CUDA aware) does not 
> support float16 MPI_SUM ops
>   -We are running into an issue with `update_multi_precision` in 
> https://github.com/ctcyang/horovod/blob/0a0240113fe5a24ec2c772fd7309840ba179562a/horovod/mxnet/__init__.py#L47
>  We don't yet have a way of hooking into SGD's `update_multi_precision` to do 
> the `hvd.allreduce` before weight update and after it is casted to float32. 
> The way it is written now, the `hvd.allreduce` is all-reducing in `float16`, 
> which does not presently support hierarchical allreduce
>   -this is an issue, because our scalability experiments for 256 GPUs in 
> float32 mode show 68% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=0, and 
> 92.1% scalability with HOROVOD_HIERARCHICAL_ALLREDUCE=1. If this analogy 
> holds for float16, hierarchical allreduce will be a necessity for getting 
> good scalability
>   -it is a good idea to rebase with Horovod `master` if you haven't done so 
> already, to take advantage of this new performance-improving feature
>   -3 possible fixes:
>     1) add MPI_SUM for float16 and do gradient all-reduce in float16 (may be 
> difficult to converge model)
>     2) hook into `update_multi_precision` after casting gradient to float32 
> and before weight update
>     3) hardcode `hvd.allreduce` here. Problems: Might not be possible for 
> mxnet to import horovod?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to