solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-368188760
@rahul003
For alexnet, try to use fp16 with GPU in kvstore_dist_server.
For resnet, try to use dist_sync.
--
solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-368185953
Witch data type were used in training?
We ues fp16 in training computation.
@rahul003
---
solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-340633802
1. In current way, I think the class kvstore_dist is just like class
DistServerWrapper mentioned above. We pass the data type to kvstore_dist first,
solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-340633802
1. In current way, I think the class kvstore_dist is just like class
DistServerWrapper mentioned above. We pass the data type to kvstore_dist first,
solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-340633802
1. In current way, I think the class kvstore_dist is just like class
DistServerWrapper mentioned above. We pass the data type to kvstore_dist first,
solin319 commented on issue #8373: distribute training in fp16
URL: https://github.com/apache/incubator-mxnet/pull/8373#issuecomment-340627932
Yes, all keys are used in fp16. Because ps_worker_ used in program was
defined with template argument. I think it's hard to define two different
ps