[incubator-mxnet] branch master updated: Add LANS optimizer (#18620)

2020-06-27 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new d6c3578  Add LANS optimizer (#18620)
d6c3578 is described below

commit d6c35785a870ac6e0b42903d7e27de2c9a6efdbe
Author: Shuai Zheng 
AuthorDate: Sat Jun 27 13:25:03 2020 -0700

Add LANS optimizer (#18620)

* add lans optimizer

* fix

* fix

Co-authored-by: Zheng 
---
 python/mxnet/ndarray/contrib.py |  78 +++
 python/mxnet/optimizer/__init__.py  |   6 +-
 python/mxnet/optimizer/lans.py  | 220 ++
 src/operator/contrib/multi_lans-inl.h   | 385 
 src/operator/contrib/multi_lans.cc  | 267 ++
 src/operator/contrib/multi_lans.cu  | 287 
 src/operator/contrib/multi_sum_sq-inl.h |  13 +-
 src/operator/contrib/multi_sum_sq.cc|  20 +-
 src/operator/contrib/multi_sum_sq.cu|  16 +-
 tests/python/unittest/test_optimizer.py |  30 +++
 10 files changed, 1302 insertions(+), 20 deletions(-)

diff --git a/python/mxnet/ndarray/contrib.py b/python/mxnet/ndarray/contrib.py
index 2ff422f..0975013 100644
--- a/python/mxnet/ndarray/contrib.py
+++ b/python/mxnet/ndarray/contrib.py
@@ -680,3 +680,81 @@ def multi_mp_lamb_update(weights, grads, mean, var, 
weights32, step_count,
learning_rates=lrs,
wds=wds,
**kwargs)
+
+
+def multi_lans_update(weights, grads, mean, var, step_count,
+  lrs, wds, out=None, num_tensors=0, **kwargs):
+"""Given a list of gradients, update weights, mean and variance of 
multiple tensors
+following LANS Optimizer implementation.
+
+Parameters
+--
+weights : List of NDArrays containing the input weights of multiple tensors
+
+grads : List of NDArrays containing input gradients
+
+mean : List of NDArrays containing mean of multiple tensors to be updated
+
+var : List of NDArrays containing variance of multiple tensors to be 
updated
+
+step_count : List of scalars with the number of update step for each tensor
+
+lrs : List of learning rates (one for each tensor)
+
+wds : List of weight decays (one for each tensor)
+
+out: List of NDArrays where the updated weights will be stored
+
+num_tensors : Number of NDArrays/tensors in the list
+"""
+
+if not num_tensors:
+num_tensors = len(weights)
+temp_list = _flatten_list(zip(weights, grads, mean, var))
+return ndarray._internal._multi_lans_update(*temp_list,
+out=out,
+num_tensors=num_tensors,
+step_count=step_count,
+learning_rates=lrs,
+wds=wds,
+**kwargs)
+
+
+def multi_mp_lans_update(weights, grads, mean, var, weights32, step_count,
+ lrs, wds, out=None, num_tensors=0, **kwargs):
+"""Given a list of gradients, update weights, mean and variance of 
multiple tensors
+following LANS Optimizer implementation, and using Mixed-Precision.
+
+Parameters
+--
+weights : List of NDArrays containing the input weights of multiple tensors
+
+grads : List of NDArrays containing input gradients
+
+mean : List of NDArrays containing mean of multiple tensors to be updated
+
+var : List of NDArrays containing variance of multiple tensors to be 
updated
+
+weights32 : Master copy of weights in FP32
+
+step_count : List of scalars with the number of update step for each tensor
+
+lrs : List of learning rates (one for each tensor)
+
+wds : List of weight decays (one for each tensor)
+
+out: List of NDArrays where the updated weights will be stored
+
+num_tensors : Number of NDArrays/tensors in the list
+"""
+
+if not num_tensors:
+num_tensors = len(weights)
+temp_list = _flatten_list(zip(weights, grads, mean, var, weights32))
+return ndarray._internal._multi_mp_lans_update(*temp_list,
+   out=out,
+   num_tensors=num_tensors,
+   step_count=step_count,
+   learning_rates=lrs,
+   wds=wds,
+   **kwargs)
diff --git a/python/mxnet/optimizer/__init__.py 
b/python/mxnet/opti

[incubator-mxnet] branch master updated: Add LANS optimizer (#18620)

2020-06-27 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new d6c3578  Add LANS optimizer (#18620)
d6c3578 is described below

commit d6c35785a870ac6e0b42903d7e27de2c9a6efdbe
Author: Shuai Zheng 
AuthorDate: Sat Jun 27 13:25:03 2020 -0700

Add LANS optimizer (#18620)

* add lans optimizer

* fix

* fix

Co-authored-by: Zheng 
---
 python/mxnet/ndarray/contrib.py |  78 +++
 python/mxnet/optimizer/__init__.py  |   6 +-
 python/mxnet/optimizer/lans.py  | 220 ++
 src/operator/contrib/multi_lans-inl.h   | 385 
 src/operator/contrib/multi_lans.cc  | 267 ++
 src/operator/contrib/multi_lans.cu  | 287 
 src/operator/contrib/multi_sum_sq-inl.h |  13 +-
 src/operator/contrib/multi_sum_sq.cc|  20 +-
 src/operator/contrib/multi_sum_sq.cu|  16 +-
 tests/python/unittest/test_optimizer.py |  30 +++
 10 files changed, 1302 insertions(+), 20 deletions(-)

diff --git a/python/mxnet/ndarray/contrib.py b/python/mxnet/ndarray/contrib.py
index 2ff422f..0975013 100644
--- a/python/mxnet/ndarray/contrib.py
+++ b/python/mxnet/ndarray/contrib.py
@@ -680,3 +680,81 @@ def multi_mp_lamb_update(weights, grads, mean, var, 
weights32, step_count,
learning_rates=lrs,
wds=wds,
**kwargs)
+
+
+def multi_lans_update(weights, grads, mean, var, step_count,
+  lrs, wds, out=None, num_tensors=0, **kwargs):
+"""Given a list of gradients, update weights, mean and variance of 
multiple tensors
+following LANS Optimizer implementation.
+
+Parameters
+--
+weights : List of NDArrays containing the input weights of multiple tensors
+
+grads : List of NDArrays containing input gradients
+
+mean : List of NDArrays containing mean of multiple tensors to be updated
+
+var : List of NDArrays containing variance of multiple tensors to be 
updated
+
+step_count : List of scalars with the number of update step for each tensor
+
+lrs : List of learning rates (one for each tensor)
+
+wds : List of weight decays (one for each tensor)
+
+out: List of NDArrays where the updated weights will be stored
+
+num_tensors : Number of NDArrays/tensors in the list
+"""
+
+if not num_tensors:
+num_tensors = len(weights)
+temp_list = _flatten_list(zip(weights, grads, mean, var))
+return ndarray._internal._multi_lans_update(*temp_list,
+out=out,
+num_tensors=num_tensors,
+step_count=step_count,
+learning_rates=lrs,
+wds=wds,
+**kwargs)
+
+
+def multi_mp_lans_update(weights, grads, mean, var, weights32, step_count,
+ lrs, wds, out=None, num_tensors=0, **kwargs):
+"""Given a list of gradients, update weights, mean and variance of 
multiple tensors
+following LANS Optimizer implementation, and using Mixed-Precision.
+
+Parameters
+--
+weights : List of NDArrays containing the input weights of multiple tensors
+
+grads : List of NDArrays containing input gradients
+
+mean : List of NDArrays containing mean of multiple tensors to be updated
+
+var : List of NDArrays containing variance of multiple tensors to be 
updated
+
+weights32 : Master copy of weights in FP32
+
+step_count : List of scalars with the number of update step for each tensor
+
+lrs : List of learning rates (one for each tensor)
+
+wds : List of weight decays (one for each tensor)
+
+out: List of NDArrays where the updated weights will be stored
+
+num_tensors : Number of NDArrays/tensors in the list
+"""
+
+if not num_tensors:
+num_tensors = len(weights)
+temp_list = _flatten_list(zip(weights, grads, mean, var, weights32))
+return ndarray._internal._multi_mp_lans_update(*temp_list,
+   out=out,
+   num_tensors=num_tensors,
+   step_count=step_count,
+   learning_rates=lrs,
+   wds=wds,
+   **kwargs)
diff --git a/python/mxnet/optimizer/__init__.py 
b/python/mxnet/opti

[incubator-mxnet] branch master updated: add epsilon to adamax (#18532)

2020-06-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new e4c93e3  add epsilon to adamax (#18532)
e4c93e3 is described below

commit e4c93e3e3a68559cb38e4ff92c9e0bf9c9cdd0bf
Author: Shuai Zheng 
AuthorDate: Wed Jun 24 22:03:39 2020 -0700

add epsilon to adamax (#18532)

Co-authored-by: Ubuntu 
---
 python/mxnet/optimizer/adamax.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/python/mxnet/optimizer/adamax.py b/python/mxnet/optimizer/adamax.py
index a2ffd9c..d7bc2d1 100644
--- a/python/mxnet/optimizer/adamax.py
+++ b/python/mxnet/optimizer/adamax.py
@@ -37,7 +37,7 @@ class Adamax(Optimizer):
 grad = clip(grad * rescale_grad, clip_gradient) + wd * weight
 m = beta1 * m_t + (1 - beta1) * grad
 u = maximum(beta2 * u, abs(grad))
-weight -= lr / (1 - beta1**t) * m / u
+weight -= lr / (1 - beta1**t) * m / (u + epsilon)
 
 This optimizer accepts the following parameters in addition to those 
accepted
 by :class:`.Optimizer`.
@@ -58,13 +58,14 @@ class Adamax(Optimizer):
 When use_fused_step=False, step is called,
 otherwise, fused_step is called.
 """
-def __init__(self, learning_rate=0.002, beta1=0.9, beta2=0.999,
+def __init__(self, learning_rate=0.002, beta1=0.9, beta2=0.999, 
epsilon=1e-8,
  use_fused_step=False, **kwargs):
 super(Adamax, self).__init__(learning_rate=learning_rate,
  use_fused_step=use_fused_step,
  **kwargs)
 self.beta1 = beta1
 self.beta2 = beta2
+self.epsilon = epsilon
 
 def create_state(self, index, weight):
 return (zeros(weight.shape, weight.context, dtype=weight.dtype),  # 
mean
@@ -107,5 +108,5 @@ class Adamax(Optimizer):
 var[:] = maximum(self.beta2 * var, NDabs(grad))
 
 # update weight
-d = mean / var
+d = mean / (var + self.epsilon)
 weight[:] -= lr * d



[incubator-mxnet] branch master updated: add epsilon to adamax (#18532)

2020-06-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new e4c93e3  add epsilon to adamax (#18532)
e4c93e3 is described below

commit e4c93e3e3a68559cb38e4ff92c9e0bf9c9cdd0bf
Author: Shuai Zheng 
AuthorDate: Wed Jun 24 22:03:39 2020 -0700

add epsilon to adamax (#18532)

Co-authored-by: Ubuntu 
---
 python/mxnet/optimizer/adamax.py | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/python/mxnet/optimizer/adamax.py b/python/mxnet/optimizer/adamax.py
index a2ffd9c..d7bc2d1 100644
--- a/python/mxnet/optimizer/adamax.py
+++ b/python/mxnet/optimizer/adamax.py
@@ -37,7 +37,7 @@ class Adamax(Optimizer):
 grad = clip(grad * rescale_grad, clip_gradient) + wd * weight
 m = beta1 * m_t + (1 - beta1) * grad
 u = maximum(beta2 * u, abs(grad))
-weight -= lr / (1 - beta1**t) * m / u
+weight -= lr / (1 - beta1**t) * m / (u + epsilon)
 
 This optimizer accepts the following parameters in addition to those 
accepted
 by :class:`.Optimizer`.
@@ -58,13 +58,14 @@ class Adamax(Optimizer):
 When use_fused_step=False, step is called,
 otherwise, fused_step is called.
 """
-def __init__(self, learning_rate=0.002, beta1=0.9, beta2=0.999,
+def __init__(self, learning_rate=0.002, beta1=0.9, beta2=0.999, 
epsilon=1e-8,
  use_fused_step=False, **kwargs):
 super(Adamax, self).__init__(learning_rate=learning_rate,
  use_fused_step=use_fused_step,
  **kwargs)
 self.beta1 = beta1
 self.beta2 = beta2
+self.epsilon = epsilon
 
 def create_state(self, index, weight):
 return (zeros(weight.shape, weight.context, dtype=weight.dtype),  # 
mean
@@ -107,5 +108,5 @@ class Adamax(Optimizer):
 var[:] = maximum(self.beta2 * var, NDabs(grad))
 
 # update weight
-d = mean / var
+d = mean / (var + self.epsilon)
 weight[:] -= lr * d



[incubator-mxnet] branch master updated (74fcb99 -> 4b86c32)

2020-06-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 74fcb99  redirect api reference on v-master to v1.6 (#18607)
 add 4b86c32  Allow input reordering duing Gluon / CachedOp graph 
transformations (#17949)

No new revisions were added by this update.

Summary of changes:
 src/imperative/cached_op.cc| 58 ++
 src/imperative/cached_op.h | 34 +---
 src/imperative/cached_op_threadsafe.cc |  4 ++-
 src/imperative/naive_cached_op.cc  |  3 +-
 tests/python/gpu/test_fusion.py| 35 
 5 files changed, 94 insertions(+), 40 deletions(-)



[incubator-mxnet] branch master updated: Allow input reordering duing Gluon / CachedOp graph transformations (#17949)

2020-06-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 4b86c32  Allow input reordering duing Gluon / CachedOp graph 
transformations (#17949)
4b86c32 is described below

commit 4b86c32832a994e76b97dfc58c8a672db87e721d
Author: mk-61 <56651474+mk...@users.noreply.github.com>
AuthorDate: Tue Jun 23 13:49:06 2020 -0700

Allow input reordering duing Gluon / CachedOp graph transformations (#17949)

* Initial commit of input reordering in Gluon

* Add test for Gluon input reorder

* Fix backward in CachedOp for input reordering

* Fix test_input_reorder for backward pass

* Fix merge error in NaiveCachedOp

* Include correct header for std::iota

Co-authored-by: Vladimir Cherepanov 
---
 src/imperative/cached_op.cc| 58 ++
 src/imperative/cached_op.h | 34 +---
 src/imperative/cached_op_threadsafe.cc |  4 ++-
 src/imperative/naive_cached_op.cc  |  3 +-
 tests/python/gpu/test_fusion.py| 35 
 5 files changed, 94 insertions(+), 40 deletions(-)

diff --git a/src/imperative/cached_op.cc b/src/imperative/cached_op.cc
index 83e8d31..7b3a5d3 100644
--- a/src/imperative/cached_op.cc
+++ b/src/imperative/cached_op.cc
@@ -147,10 +147,9 @@ bool CachedOp::CheckDynamicShapeExists(const Context& 
default_ctx,
   auto& state = state_ptr.get_state();
 
   nnvm::Graph& g = state.info.fwd_graph;
-  ShapeVector shape_inputs;
-  shape_inputs.reserve(inputs.size());
-  for (auto input : inputs) {
-shape_inputs.emplace_back(input->shape());
+  ShapeVector shape_inputs(inputs.size());
+  for (size_t i = 0; i < inputs.size(); ++i) {
+shape_inputs[i] = inputs[state.info.input_map[i]]->shape();
   }
   // We leverage the shape inference pass to detect whether dynamic shape 
exists.
   // If so, the pass will fail with `contain_dynamic_shape = true`,
@@ -176,16 +175,13 @@ bool CachedOp::SetForwardGraph(
   CHECK_EQ(inputs.size(), num_inputs());
   nnvm::Graph& g = info->fwd_graph;
 
-  ShapeVector shape_inputs;
-  DTypeVector dtype_inputs;
-  StorageTypeVector storage_type_inputs;
-  shape_inputs.reserve(inputs.size());
-  dtype_inputs.reserve(inputs.size());
-  storage_type_inputs.reserve(inputs.size());
-  for (auto input : inputs) {
-shape_inputs.emplace_back(input->shape());
-dtype_inputs.emplace_back(input->dtype());
-storage_type_inputs.emplace_back(input->storage_type());
+  ShapeVector shape_inputs(inputs.size());
+  DTypeVector dtype_inputs(inputs.size());
+  StorageTypeVector storage_type_inputs(inputs.size());
+  for (size_t i = 0; i < inputs.size(); ++i) {
+shape_inputs[i] = inputs[info->input_map[i]]->shape();
+dtype_inputs[i] = inputs[info->input_map[i]]->dtype();
+storage_type_inputs[i] = inputs[info->input_map[i]]->storage_type();
   }
 
   bool match = true;
@@ -321,9 +317,10 @@ bool CachedOp::SetBackwardGraph(
 if (info->bwd_input_eid[i] == kEidNotExist) {
   continue;
 }
-shapes[info->bwd_input_eid[i]] = inputs[i]->shape();
-dtypes[info->bwd_input_eid[i]] = inputs[i]->dtype();
-stypes[info->bwd_input_eid[i]] = inputs[i]->storage_type();
+size_t oi = BwdOriginalInput(info->input_map, i);
+shapes[info->bwd_input_eid[i]] = inputs[oi]->shape();
+dtypes[info->bwd_input_eid[i]] = inputs[oi]->dtype();
+stypes[info->bwd_input_eid[i]] = inputs[oi]->storage_type();
   }
 
   std::pair node_range, entry_range;
@@ -649,22 +646,22 @@ OpStatePtr CachedOp::StaticForward(
   if (config_.static_shape) {
 for (auto i : config_.param_indices) {
   auto nid = idx.input_nodes()[i];
-  if (!arrays[idx.entry_id(nid, 0)]->IsSame(*inputs[i])) {
+  if (!arrays[idx.entry_id(nid, 
0)]->IsSame(*inputs[state.info.input_map[i]])) {
 match = false;
 auto ptr = &state.buff[idx.entry_id(nid, 0)];
 CHECK_EQ(arrays[idx.entry_id(nid, 0)], ptr);
-*arrays[idx.entry_id(nid, 0)] = *inputs[i];
+*arrays[idx.entry_id(nid, 0)] = *inputs[state.info.input_map[i]];
 state.dynamic_entries[idx.entry_id(nid, 0)] = false;
   }
 }
 for (auto i : config_.data_indices) {
   auto eid = idx.entry_id(idx.input_nodes()[i], 0);
-  arrays[eid] = inputs[i];
+  arrays[eid] = inputs[state.info.input_map[i]];
 }
   } else {
 for (size_t i = 0; i < num_inputs(); ++i) {
   auto nid = idx.input_nodes()[i];
-  arrays[idx.entry_id(nid, 0)] = inputs[i];
+  arrays[idx.entry_id(nid, 0)] = inputs[state.info.input_map[i]];
 }
   }
 
@@ -714,6 +711,7 @@ OpStatePtr CachedOp::DynamicForward(
 std::lock_guard lock(state.mutex)

[incubator-mxnet] branch master updated (a1db5b2 -> 5df0025)

2020-06-05 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1db5b2  Update .codecov.yml (#18497)
 add 5df0025  Fix race condition in FusedOp (#18498)

No new revisions were added by this update.

Summary of changes:
 src/operator/fusion/fused_op.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated (a1db5b2 -> 5df0025)

2020-06-05 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1db5b2  Update .codecov.yml (#18497)
 add 5df0025  Fix race condition in FusedOp (#18498)

No new revisions were added by this update.

Summary of changes:
 src/operator/fusion/fused_op.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated (a1db5b2 -> 5df0025)

2020-06-05 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1db5b2  Update .codecov.yml (#18497)
 add 5df0025  Fix race condition in FusedOp (#18498)

No new revisions were added by this update.

Summary of changes:
 src/operator/fusion/fused_op.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated (a1db5b2 -> 5df0025)

2020-06-05 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1db5b2  Update .codecov.yml (#18497)
 add 5df0025  Fix race condition in FusedOp (#18498)

No new revisions were added by this update.

Summary of changes:
 src/operator/fusion/fused_op.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated: Fix race condition in FusedOp (#18498)

2020-06-05 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 5df0025  Fix race condition in FusedOp (#18498)
5df0025 is described below

commit 5df002567dd2e9ebcfeb620a9ba55adbded743da
Author: Przemyslaw Tredak 
AuthorDate: Fri Jun 5 19:55:06 2020 -0700

Fix race condition in FusedOp (#18498)
---
 src/operator/fusion/fused_op.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/operator/fusion/fused_op.cc b/src/operator/fusion/fused_op.cc
index 2ac0b53..ee470cf 100644
--- a/src/operator/fusion/fused_op.cc
+++ b/src/operator/fusion/fused_op.cc
@@ -61,6 +61,7 @@ FusedOp::FusedOp(const nnvm::NodeAttrs* attrs, const 
FusedOpConfig& config) :
 bool FusedOp::InferShape(const nnvm::NodeAttrs &attrs,
  std::vector *in_attrs,
  std::vector *out_attrs) {
+  std::lock_guard lock(my_mutex_);
   subgraph_.attrs.erase("shape");
   subgraph_.attrs.erase("shape_inputs");
   std::vector input_shapes(*in_attrs);
@@ -95,7 +96,6 @@ bool FusedOp::InferShape(const nnvm::NodeAttrs &attrs,
 inferred = inferred && !op::shape_is_none(attr);
   }
   if (inferred) {
-std::lock_guard lock(my_mutex_);
 intermediate_shapes_.push_back({*in_attrs, *out_attrs, shapes});
   }
   return inferred;
@@ -104,6 +104,7 @@ bool FusedOp::InferShape(const nnvm::NodeAttrs &attrs,
 bool FusedOp::InferType(const nnvm::NodeAttrs &attrs,
 std::vector *in_attrs,
 std::vector *out_attrs) {
+  std::lock_guard lock(my_mutex_);
   subgraph_.attrs.erase("dtype");
   subgraph_.attrs.erase("dtype_inputs");
   std::vector input_types(*in_attrs);
@@ -138,7 +139,6 @@ bool FusedOp::InferType(const nnvm::NodeAttrs &attrs,
 inferred = inferred && !op::type_is_none(attr);
   }
   if (inferred) {
-std::lock_guard lock(my_mutex_);
 intermediate_dtypes_.push_back({*in_attrs, *out_attrs, types});
   }
   return inferred;



[incubator-mxnet] branch v1.x updated: Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

2020-05-29 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/v1.x by this push:
 new d621e50  Revert PR 17767 for fixing GPU memory usage regression 
(#18283) (#18309)
d621e50 is described below

commit d621e50862a96d259135fcfac0098f7709ee0f00
Author: Ziyi Mu 
AuthorDate: Fri May 29 14:51:17 2020 -0700

Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

* Revert "Fix and optimize handling of vectorized memory accesses (#17767)"

This reverts commit 5542d03695b4a2589afb88acf128d4ba8ac94d0d.

* add license to reverted file
---
 3rdparty/mshadow/mshadow/base.h|  48 +++
 3rdparty/mshadow/mshadow/half2.h   | 162 +++
 src/common/cuda_vectorization.cuh  | 283 --
 src/operator/mshadow_op.h  |  67 +
 src/operator/tensor/elemwise_binary_op.cuh | 322 -
 src/operator/tensor/elemwise_binary_op.h   | 206 ++---
 src/operator/tensor/elemwise_binary_op_basic.cu|  23 +-
 src/operator/tensor/elemwise_binary_scalar_op.cuh  | 207 -
 src/operator/tensor/elemwise_binary_scalar_op.h|  75 +
 .../tensor/elemwise_binary_scalar_op_basic.cu  |   9 +-
 .../tensor/elemwise_binary_scalar_op_extended.cu   |  15 +-
 src/operator/tensor/elemwise_sum.cu| 112 +--
 src/operator/tensor/elemwise_sum.h |  12 +
 src/operator/tensor/elemwise_unary_op.cuh  | 127 
 src/operator/tensor/elemwise_unary_op.h|  56 ++--
 src/operator/tensor/elemwise_unary_op_basic.cu |   1 -
 src/operator/tensor/elemwise_unary_op_pow.cu   |   1 -
 src/operator/tensor/elemwise_unary_op_trig.cu  |   1 -
 tests/python/unittest/test_operator.py |  81 +-
 19 files changed, 464 insertions(+), 1344 deletions(-)

diff --git a/3rdparty/mshadow/mshadow/base.h b/3rdparty/mshadow/mshadow/base.h
index 6469bbc..9f53857 100755
--- a/3rdparty/mshadow/mshadow/base.h
+++ b/3rdparty/mshadow/mshadow/base.h
@@ -295,6 +295,7 @@ extern "C" {
   }
 
 #include "./half.h"
+#include "./half2.h"
 #include "./bfloat.h"
 #define MSHADOW_HALF_BF_OPERATOR(RTYPE, OP)
   \
   MSHADOW_XINLINE RTYPE operator OP(mshadow::half::half_t a, 
mshadow::bfloat::bf16_t b) { \
@@ -409,6 +410,11 @@ struct DataType {
 #endif
 };
 template<>
+struct DataType {
+  static const int kFlag = kFloat16;
+  static const int kLanes = 2;
+};
+template<>
 struct DataType {
   static const int kFlag = kBfloat16;
   static const int kLanes = 1;
@@ -1161,6 +1167,48 @@ struct minimum {
   }
 #endif
 
+#define MSHADOW_TYPE_SWITCH_WITH_HALF2(type, DType, ...)  \
+  switch (type) { \
+  case mshadow::kFloat32: \
+{ \
+  typedef float DType;\
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat64: \
+{ \
+  typedef double DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat16: \
+{ \
+  typedef mshadow::half::half2_t DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kUint8:   \
+{ \
+  typedef uint8_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt32:   \
+{ \
+  typedef int32_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt64:   \
+{ 

[incubator-mxnet] branch v1.x updated: Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

2020-05-29 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/v1.x by this push:
 new d621e50  Revert PR 17767 for fixing GPU memory usage regression 
(#18283) (#18309)
d621e50 is described below

commit d621e50862a96d259135fcfac0098f7709ee0f00
Author: Ziyi Mu 
AuthorDate: Fri May 29 14:51:17 2020 -0700

Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

* Revert "Fix and optimize handling of vectorized memory accesses (#17767)"

This reverts commit 5542d03695b4a2589afb88acf128d4ba8ac94d0d.

* add license to reverted file
---
 3rdparty/mshadow/mshadow/base.h|  48 +++
 3rdparty/mshadow/mshadow/half2.h   | 162 +++
 src/common/cuda_vectorization.cuh  | 283 --
 src/operator/mshadow_op.h  |  67 +
 src/operator/tensor/elemwise_binary_op.cuh | 322 -
 src/operator/tensor/elemwise_binary_op.h   | 206 ++---
 src/operator/tensor/elemwise_binary_op_basic.cu|  23 +-
 src/operator/tensor/elemwise_binary_scalar_op.cuh  | 207 -
 src/operator/tensor/elemwise_binary_scalar_op.h|  75 +
 .../tensor/elemwise_binary_scalar_op_basic.cu  |   9 +-
 .../tensor/elemwise_binary_scalar_op_extended.cu   |  15 +-
 src/operator/tensor/elemwise_sum.cu| 112 +--
 src/operator/tensor/elemwise_sum.h |  12 +
 src/operator/tensor/elemwise_unary_op.cuh  | 127 
 src/operator/tensor/elemwise_unary_op.h|  56 ++--
 src/operator/tensor/elemwise_unary_op_basic.cu |   1 -
 src/operator/tensor/elemwise_unary_op_pow.cu   |   1 -
 src/operator/tensor/elemwise_unary_op_trig.cu  |   1 -
 tests/python/unittest/test_operator.py |  81 +-
 19 files changed, 464 insertions(+), 1344 deletions(-)

diff --git a/3rdparty/mshadow/mshadow/base.h b/3rdparty/mshadow/mshadow/base.h
index 6469bbc..9f53857 100755
--- a/3rdparty/mshadow/mshadow/base.h
+++ b/3rdparty/mshadow/mshadow/base.h
@@ -295,6 +295,7 @@ extern "C" {
   }
 
 #include "./half.h"
+#include "./half2.h"
 #include "./bfloat.h"
 #define MSHADOW_HALF_BF_OPERATOR(RTYPE, OP)
   \
   MSHADOW_XINLINE RTYPE operator OP(mshadow::half::half_t a, 
mshadow::bfloat::bf16_t b) { \
@@ -409,6 +410,11 @@ struct DataType {
 #endif
 };
 template<>
+struct DataType {
+  static const int kFlag = kFloat16;
+  static const int kLanes = 2;
+};
+template<>
 struct DataType {
   static const int kFlag = kBfloat16;
   static const int kLanes = 1;
@@ -1161,6 +1167,48 @@ struct minimum {
   }
 #endif
 
+#define MSHADOW_TYPE_SWITCH_WITH_HALF2(type, DType, ...)  \
+  switch (type) { \
+  case mshadow::kFloat32: \
+{ \
+  typedef float DType;\
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat64: \
+{ \
+  typedef double DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat16: \
+{ \
+  typedef mshadow::half::half2_t DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kUint8:   \
+{ \
+  typedef uint8_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt32:   \
+{ \
+  typedef int32_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt64:   \
+{ 

[incubator-mxnet] branch v1.x updated: Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

2020-05-29 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/v1.x by this push:
 new d621e50  Revert PR 17767 for fixing GPU memory usage regression 
(#18283) (#18309)
d621e50 is described below

commit d621e50862a96d259135fcfac0098f7709ee0f00
Author: Ziyi Mu 
AuthorDate: Fri May 29 14:51:17 2020 -0700

Revert PR 17767 for fixing GPU memory usage regression (#18283) (#18309)

* Revert "Fix and optimize handling of vectorized memory accesses (#17767)"

This reverts commit 5542d03695b4a2589afb88acf128d4ba8ac94d0d.

* add license to reverted file
---
 3rdparty/mshadow/mshadow/base.h|  48 +++
 3rdparty/mshadow/mshadow/half2.h   | 162 +++
 src/common/cuda_vectorization.cuh  | 283 --
 src/operator/mshadow_op.h  |  67 +
 src/operator/tensor/elemwise_binary_op.cuh | 322 -
 src/operator/tensor/elemwise_binary_op.h   | 206 ++---
 src/operator/tensor/elemwise_binary_op_basic.cu|  23 +-
 src/operator/tensor/elemwise_binary_scalar_op.cuh  | 207 -
 src/operator/tensor/elemwise_binary_scalar_op.h|  75 +
 .../tensor/elemwise_binary_scalar_op_basic.cu  |   9 +-
 .../tensor/elemwise_binary_scalar_op_extended.cu   |  15 +-
 src/operator/tensor/elemwise_sum.cu| 112 +--
 src/operator/tensor/elemwise_sum.h |  12 +
 src/operator/tensor/elemwise_unary_op.cuh  | 127 
 src/operator/tensor/elemwise_unary_op.h|  56 ++--
 src/operator/tensor/elemwise_unary_op_basic.cu |   1 -
 src/operator/tensor/elemwise_unary_op_pow.cu   |   1 -
 src/operator/tensor/elemwise_unary_op_trig.cu  |   1 -
 tests/python/unittest/test_operator.py |  81 +-
 19 files changed, 464 insertions(+), 1344 deletions(-)

diff --git a/3rdparty/mshadow/mshadow/base.h b/3rdparty/mshadow/mshadow/base.h
index 6469bbc..9f53857 100755
--- a/3rdparty/mshadow/mshadow/base.h
+++ b/3rdparty/mshadow/mshadow/base.h
@@ -295,6 +295,7 @@ extern "C" {
   }
 
 #include "./half.h"
+#include "./half2.h"
 #include "./bfloat.h"
 #define MSHADOW_HALF_BF_OPERATOR(RTYPE, OP)
   \
   MSHADOW_XINLINE RTYPE operator OP(mshadow::half::half_t a, 
mshadow::bfloat::bf16_t b) { \
@@ -409,6 +410,11 @@ struct DataType {
 #endif
 };
 template<>
+struct DataType {
+  static const int kFlag = kFloat16;
+  static const int kLanes = 2;
+};
+template<>
 struct DataType {
   static const int kFlag = kBfloat16;
   static const int kLanes = 1;
@@ -1161,6 +1167,48 @@ struct minimum {
   }
 #endif
 
+#define MSHADOW_TYPE_SWITCH_WITH_HALF2(type, DType, ...)  \
+  switch (type) { \
+  case mshadow::kFloat32: \
+{ \
+  typedef float DType;\
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat64: \
+{ \
+  typedef double DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kFloat16: \
+{ \
+  typedef mshadow::half::half2_t DType;   \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kUint8:   \
+{ \
+  typedef uint8_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt32:   \
+{ \
+  typedef int32_t DType;  \
+  {__VA_ARGS__}   \
+} \
+break;\
+  case mshadow::kInt64:   \
+{ 

[incubator-mxnet] branch master updated (5343aef -> 4827de8)

2020-05-21 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 5343aef  [Numpy] Fix gluon activations (#18370)
 add 4827de8  Improve the backward mirroring implementation (#18228)

No new revisions were added by this update.

Summary of changes:
 ci/windows/test_py3_cpu.ps1   |   6 +
 ci/windows/test_py3_gpu.ps1   |   7 +
 docs/static_site/src/pages/api/faq/env_var.md |   6 +-
 example/image-classification/README.md|  11 +-
 python/mxnet/rnn/rnn_cell.py  |   5 +
 src/executor/exec_pass.h  |  37 +-
 src/executor/graph_executor.cc| 128 +++--
 src/executor/graph_executor.h |   8 +-
 src/imperative/cached_op.h|   2 +-
 src/imperative/imperative.cc  |   2 +-
 src/nnvm/gradient.cc  | 709 +-
 src/nnvm/plan_memory.cc   |  15 +-
 src/operator/nn/activation-inl.h  |   9 +-
 src/operator/nn/activation.cc |  50 +-
 src/operator/nn/activation.cu |  46 +-
 src/operator/nn/cudnn/cudnn_batch_norm-inl.h  |  16 +-
 tests/python/unittest/test_memory_opt.py  | 202 
 17 files changed, 1009 insertions(+), 250 deletions(-)
 create mode 100644 tests/python/unittest/test_memory_opt.py



[incubator-mxnet] branch master updated: Fix races in block scope (#17749)

2020-05-20 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new f4d0290  Fix races in block scope (#17749)
f4d0290 is described below

commit f4d0290fc2cd5763aa5a9c890e4d3dcd4ea6ec6b
Author: Haozheng Fan 
AuthorDate: Thu May 21 07:16:22 2020 +0800

Fix races in block scope (#17749)

* Add tests

* Fix block_scope

Co-authored-by: Haibin Lin 
Co-authored-by: Lin 
---
 python/mxnet/gluon/block.py| 25 +++--
 python/mxnet/name.py   |  9 
 tests/python/unittest/test_thread_local.py | 36 ++
 3 files changed, 54 insertions(+), 16 deletions(-)

diff --git a/python/mxnet/gluon/block.py b/python/mxnet/gluon/block.py
index 6d9ea9a..ded66a7 100644
--- a/python/mxnet/gluon/block.py
+++ b/python/mxnet/gluon/block.py
@@ -52,8 +52,9 @@ class _BlockScope(object):
 def __init__(self, block):
 self._block = weakref.ref(block) if block is not None else None
 self._counter = {}
-self._old_scope = None
-self._name_scope = None
+self._local = threading.local()
+self._local._old_scope = None
+self._local._name_scope = None
 
 @staticmethod
 def create(prefix, params, hint):
@@ -96,23 +97,23 @@ class _BlockScope(object):
 block = self._block()
 if block is None or block._empty_prefix:
 return self
-self._old_scope = getattr(_BlockScope._current, "value", None)
+self._local._old_scope = getattr(_BlockScope._current, "value", None)
 _BlockScope._current.value = self
-self._name_scope = _name.Prefix(block.prefix)
-self._name_scope.__enter__()
-self._profiler_scope = _profiler.Scope(block._profiler_scope_name)
-self._profiler_scope.__enter__()
+self._local._name_scope = _name.Prefix(block.prefix)
+self._local._name_scope.__enter__()
+self._local._profiler_scope = 
_profiler.Scope(block._profiler_scope_name)
+self._local._profiler_scope.__enter__()
 return self
 
 def __exit__(self, ptype, value, trace):
 block = self._block()
 if block is None or block._empty_prefix:
 return
-self._name_scope.__exit__(ptype, value, trace)
-self._name_scope = None
-self._profiler_scope.__exit__(ptype, value, trace)
-self._profiler_scope = None
-_BlockScope._current.value = self._old_scope
+self._local._name_scope.__exit__(ptype, value, trace)
+self._local._name_scope = None
+self._local._profiler_scope.__exit__(ptype, value, trace)
+self._local._profiler_scope = None
+_BlockScope._current.value = self._local._old_scope
 
 
 def _gather_type_ctx_info(args):
diff --git a/python/mxnet/name.py b/python/mxnet/name.py
index b276c72..e39752e 100644
--- a/python/mxnet/name.py
+++ b/python/mxnet/name.py
@@ -30,7 +30,8 @@ class NameManager(with_metaclass(_MXClassPropertyMetaClass, 
object)):
 
 def __init__(self):
 self._counter = {}
-self._old_manager = None
+self._local = threading.local()
+self._local._old_manager = None
 
 def get(self, name, hint):
 """Get the canonical name for a symbol.
@@ -66,13 +67,13 @@ class NameManager(with_metaclass(_MXClassPropertyMetaClass, 
object)):
 def __enter__(self):
 if not hasattr(NameManager._current, "value"):
 NameManager._current.value = NameManager()
-self._old_manager = NameManager._current.value
+self._local._old_manager = NameManager._current.value
 NameManager._current.value = self
 return self
 
 def __exit__(self, ptype, value, trace):
-assert self._old_manager
-NameManager._current.value = self._old_manager
+assert self._local._old_manager
+NameManager._current.value = self._local._old_manager
 
 #pylint: disable=no-self-argument
 @classproperty
diff --git a/tests/python/unittest/test_thread_local.py 
b/tests/python/unittest/test_thread_local.py
index 5423249..975ad2a 100644
--- a/tests/python/unittest/test_thread_local.py
+++ b/tests/python/unittest/test_thread_local.py
@@ -222,3 +222,39 @@ def test_np_global_shape():
 finally:
 set_np_shape(0)
 
+def test_blockscope_multithread():
+event = threading.Event()
+status = [False]
+
+class dummy_block(object):
+def __init__(self, prefix):
+self.prefix = prefix
+self._profiler_scope_name = prefix
+self._empty_prefix = False
+
+def f(scope):
+try:
+with scope:
+event.wait()
+except:
+status[0] = True
+
+def g(scope):
+wit

[incubator-mxnet] branch master updated (7ab326c -> 7f5df07)

2020-05-18 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7ab326c  [numpy] add dlpack functions to npx (#18342)
 add 7f5df07  [BUGFIX] Remove Profiler from the runtime feature list, since 
its always built (#18308)

No new revisions were added by this update.

Summary of changes:
 include/mxnet/libinfo.h   | 1 -
 perl-package/AI-MXNet/lib/AI/MXNet/RunTime.pm | 3 +--
 python/mxnet/runtime.py   | 2 +-
 src/libinfo.cc| 1 -
 4 files changed, 2 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated (7ab326c -> 7f5df07)

2020-05-18 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7ab326c  [numpy] add dlpack functions to npx (#18342)
 add 7f5df07  [BUGFIX] Remove Profiler from the runtime feature list, since 
its always built (#18308)

No new revisions were added by this update.

Summary of changes:
 include/mxnet/libinfo.h   | 1 -
 perl-package/AI-MXNet/lib/AI/MXNet/RunTime.pm | 3 +--
 python/mxnet/runtime.py   | 2 +-
 src/libinfo.cc| 1 -
 4 files changed, 2 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated (7ab326c -> 7f5df07)

2020-05-18 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7ab326c  [numpy] add dlpack functions to npx (#18342)
 add 7f5df07  [BUGFIX] Remove Profiler from the runtime feature list, since 
its always built (#18308)

No new revisions were added by this update.

Summary of changes:
 include/mxnet/libinfo.h   | 1 -
 perl-package/AI-MXNet/lib/AI/MXNet/RunTime.pm | 3 +--
 python/mxnet/runtime.py   | 2 +-
 src/libinfo.cc| 1 -
 4 files changed, 2 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated (7ab326c -> 7f5df07)

2020-05-18 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7ab326c  [numpy] add dlpack functions to npx (#18342)
 add 7f5df07  [BUGFIX] Remove Profiler from the runtime feature list, since 
its always built (#18308)

No new revisions were added by this update.

Summary of changes:
 include/mxnet/libinfo.h   | 1 -
 perl-package/AI-MXNet/lib/AI/MXNet/RunTime.pm | 3 +--
 python/mxnet/runtime.py   | 2 +-
 src/libinfo.cc| 1 -
 4 files changed, 2 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated (7ab326c -> 7f5df07)

2020-05-18 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7ab326c  [numpy] add dlpack functions to npx (#18342)
 add 7f5df07  [BUGFIX] Remove Profiler from the runtime feature list, since 
its always built (#18308)

No new revisions were added by this update.

Summary of changes:
 include/mxnet/libinfo.h   | 1 -
 perl-package/AI-MXNet/lib/AI/MXNet/RunTime.pm | 3 +--
 python/mxnet/runtime.py   | 2 +-
 src/libinfo.cc| 1 -
 4 files changed, 2 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated (37280e4 -> 09224c4)

2020-05-16 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 37280e4  Fix deferred compute mode for operators using new FFI (#18284)
 add 09224c4  Add a timeout to the storage profiler in case mem_counters_ 
is not yet initialized (#18306)

No new revisions were added by this update.

Summary of changes:
 src/profiler/storage_profiler.h | 14 ++
 1 file changed, 14 insertions(+)



[incubator-mxnet] branch master updated (37280e4 -> 09224c4)

2020-05-16 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 37280e4  Fix deferred compute mode for operators using new FFI (#18284)
 add 09224c4  Add a timeout to the storage profiler in case mem_counters_ 
is not yet initialized (#18306)

No new revisions were added by this update.

Summary of changes:
 src/profiler/storage_profiler.h | 14 ++
 1 file changed, 14 insertions(+)



[incubator-mxnet] branch master updated: add gelu doc (#18274)

2020-05-12 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 8a5886a  add gelu doc (#18274)
8a5886a is described below

commit 8a5886a6770808db78ae62f3fbfe887c507c47de
Author: Haibin Lin 
AuthorDate: Tue May 12 14:37:40 2020 -0700

add gelu doc (#18274)

Co-authored-by: Lin 
---
 src/operator/leaky_relu.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/operator/leaky_relu.cc b/src/operator/leaky_relu.cc
index d3ed234..681ca44 100644
--- a/src/operator/leaky_relu.cc
+++ b/src/operator/leaky_relu.cc
@@ -150,6 +150,7 @@ when the input is negative and has a slope of one when 
input is positive.
 The following modified ReLU Activation functions are supported:
 
 - *elu*: Exponential Linear Unit. `y = x > 0 ? x : slope * (exp(x)-1)`
+- *gelu*: Gaussian Error Linear Unit. `y = 0.5 * x * (1 + erf(x / sqrt(2)))`
 - *selu*: Scaled Exponential Linear Unit. `y = lambda * (x > 0 ? x : alpha * 
(exp(x) - 1))` where
   *lambda = 1.0507009873554804934193349852946* and *alpha = 
1.6732632423543772848170429916717*.
 - *leaky*: Leaky ReLU. `y = x > 0 ? x : slope * x`



[incubator-mxnet] branch master updated: add gelu doc (#18274)

2020-05-12 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 8a5886a  add gelu doc (#18274)
8a5886a is described below

commit 8a5886a6770808db78ae62f3fbfe887c507c47de
Author: Haibin Lin 
AuthorDate: Tue May 12 14:37:40 2020 -0700

add gelu doc (#18274)

Co-authored-by: Lin 
---
 src/operator/leaky_relu.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/operator/leaky_relu.cc b/src/operator/leaky_relu.cc
index d3ed234..681ca44 100644
--- a/src/operator/leaky_relu.cc
+++ b/src/operator/leaky_relu.cc
@@ -150,6 +150,7 @@ when the input is negative and has a slope of one when 
input is positive.
 The following modified ReLU Activation functions are supported:
 
 - *elu*: Exponential Linear Unit. `y = x > 0 ? x : slope * (exp(x)-1)`
+- *gelu*: Gaussian Error Linear Unit. `y = 0.5 * x * (1 + erf(x / sqrt(2)))`
 - *selu*: Scaled Exponential Linear Unit. `y = lambda * (x > 0 ? x : alpha * 
(exp(x) - 1))` where
   *lambda = 1.0507009873554804934193349852946* and *alpha = 
1.6732632423543772848170429916717*.
 - *leaky*: Leaky ReLU. `y = x > 0 ? x : slope * x`



[incubator-mxnet] branch master updated: Fix interleave matmul doc (#18260)

2020-05-09 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new de51058  Fix interleave matmul doc (#18260)
de51058 is described below

commit de510582438ad5fad576eba1b85c845b0ba9989c
Author: Haibin Lin 
AuthorDate: Sat May 9 23:06:27 2020 -0700

Fix interleave matmul doc (#18260)

* fix doc

* fix doc

* fix axis

Co-authored-by: Lin 
---
 src/operator/contrib/transformer.cc | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/operator/contrib/transformer.cc 
b/src/operator/contrib/transformer.cc
index 58826a2..1abd2a0 100644
--- a/src/operator/contrib/transformer.cc
+++ b/src/operator/contrib/transformer.cc
@@ -655,14 +655,16 @@ the input must be a single tensor of interleaved 
projections
 of queries, keys and values following the layout:
 (seq_length, batch_size, num_heads * head_dim * 3)
 
-the equivalent code would be:
-tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
-q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
-q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
-q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
-k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
-k_proj = mx.nd.reshap(k_proj, shape=(-1, 0, 0), reverse=True)
-output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)
+the equivalent code would be::
+
+  tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
+  q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
+  q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
+  q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
+  k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
+  k_proj = mx.nd.reshape(k_proj, shape=(-1, 0, 0), reverse=True)
+  output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)
+
 )code" ADD_FILELINE)
 .set_num_inputs(1)
 .set_num_outputs(1)
@@ -703,9 +705,9 @@ the equivalent code would be:
 tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
 v_proj = mx.nd.transpose(tmp[:,:,:,2,:], axes=(1, 2, 0, 3))
 v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
-output = mx.nd.batch_dot(attention, v_proj, transpose_b=True)
+output = mx.nd.batch_dot(attention, v_proj)
 output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
-output = mx.nd.transpose(output, axes=(0, 2, 1, 3))
+output = mx.nd.transpose(output, axes=(2, 0, 1, 3))
 output = mx.nd.reshape(output, shape=(0, 0, -1))
 )code" ADD_FILELINE)
 .set_num_inputs(2)



[incubator-mxnet] branch master updated: Fix interleave matmul doc (#18260)

2020-05-09 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new de51058  Fix interleave matmul doc (#18260)
de51058 is described below

commit de510582438ad5fad576eba1b85c845b0ba9989c
Author: Haibin Lin 
AuthorDate: Sat May 9 23:06:27 2020 -0700

Fix interleave matmul doc (#18260)

* fix doc

* fix doc

* fix axis

Co-authored-by: Lin 
---
 src/operator/contrib/transformer.cc | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/operator/contrib/transformer.cc 
b/src/operator/contrib/transformer.cc
index 58826a2..1abd2a0 100644
--- a/src/operator/contrib/transformer.cc
+++ b/src/operator/contrib/transformer.cc
@@ -655,14 +655,16 @@ the input must be a single tensor of interleaved 
projections
 of queries, keys and values following the layout:
 (seq_length, batch_size, num_heads * head_dim * 3)
 
-the equivalent code would be:
-tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
-q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
-q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
-q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
-k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
-k_proj = mx.nd.reshap(k_proj, shape=(-1, 0, 0), reverse=True)
-output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)
+the equivalent code would be::
+
+  tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
+  q_proj = mx.nd.transpose(tmp[:,:,:,0,:], axes=(1, 2, 0, 3))
+  q_proj = mx.nd.reshape(q_proj, shape=(-1, 0, 0), reverse=True)
+  q_proj = mx.nd.contrib.div_sqrt_dim(q_proj)
+  k_proj = mx.nd.transpose(tmp[:,:,:,1,:], axes=(1, 2, 0, 3))
+  k_proj = mx.nd.reshape(k_proj, shape=(-1, 0, 0), reverse=True)
+  output = mx.nd.batch_dot(q_proj, k_proj, transpose_b=True)
+
 )code" ADD_FILELINE)
 .set_num_inputs(1)
 .set_num_outputs(1)
@@ -703,9 +705,9 @@ the equivalent code would be:
 tmp = mx.nd.reshape(queries_keys_values, shape=(0, 0, num_heads, 3, -1))
 v_proj = mx.nd.transpose(tmp[:,:,:,2,:], axes=(1, 2, 0, 3))
 v_proj = mx.nd.reshape(v_proj, shape=(-1, 0, 0), reverse=True)
-output = mx.nd.batch_dot(attention, v_proj, transpose_b=True)
+output = mx.nd.batch_dot(attention, v_proj)
 output = mx.nd.reshape(output, shape=(-1, num_heads, 0, 0), reverse=True)
-output = mx.nd.transpose(output, axes=(0, 2, 1, 3))
+output = mx.nd.transpose(output, axes=(2, 0, 1, 3))
 output = mx.nd.reshape(output, shape=(0, 0, -1))
 )code" ADD_FILELINE)
 .set_num_inputs(2)



[incubator-mxnet] branch master updated (fb73a17 -> e796ae9)

2020-04-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fb73a17  Switch to C++17 and modernize toolchain + CI (#17984)
 add e796ae9  Integrate Horovod training API as part of MXNet native 
distributed training API (#17531)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   5 +-
 .../{cifar10_dist.py => cifar10_kvstore_hvd.py}| 243 -
 python/mxnet/gluon/trainer.py  |   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/horovod.py| 161 ++
 python/mxnet/kvstore/kvstore.py|   3 +
 tests/nightly/dist_device_sync_kvstore_horovod.py  |  80 +++
 tests/nightly/test_distributed_training-gpu.sh |  11 +-
 tools/launch.py|  63 +++---
 9 files changed, 429 insertions(+), 139 deletions(-)
 copy example/distributed_training/{cifar10_dist.py => cifar10_kvstore_hvd.py} 
(52%)
 create mode 100644 python/mxnet/kvstore/horovod.py
 create mode 100644 tests/nightly/dist_device_sync_kvstore_horovod.py



[incubator-mxnet] branch master updated (fb73a17 -> e796ae9)

2020-04-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fb73a17  Switch to C++17 and modernize toolchain + CI (#17984)
 add e796ae9  Integrate Horovod training API as part of MXNet native 
distributed training API (#17531)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   5 +-
 .../{cifar10_dist.py => cifar10_kvstore_hvd.py}| 243 -
 python/mxnet/gluon/trainer.py  |   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/horovod.py| 161 ++
 python/mxnet/kvstore/kvstore.py|   3 +
 tests/nightly/dist_device_sync_kvstore_horovod.py  |  80 +++
 tests/nightly/test_distributed_training-gpu.sh |  11 +-
 tools/launch.py|  63 +++---
 9 files changed, 429 insertions(+), 139 deletions(-)
 copy example/distributed_training/{cifar10_dist.py => cifar10_kvstore_hvd.py} 
(52%)
 create mode 100644 python/mxnet/kvstore/horovod.py
 create mode 100644 tests/nightly/dist_device_sync_kvstore_horovod.py



[incubator-mxnet] branch master updated (fb73a17 -> e796ae9)

2020-04-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fb73a17  Switch to C++17 and modernize toolchain + CI (#17984)
 add e796ae9  Integrate Horovod training API as part of MXNet native 
distributed training API (#17531)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   5 +-
 .../{cifar10_dist.py => cifar10_kvstore_hvd.py}| 243 -
 python/mxnet/gluon/trainer.py  |   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/horovod.py| 161 ++
 python/mxnet/kvstore/kvstore.py|   3 +
 tests/nightly/dist_device_sync_kvstore_horovod.py  |  80 +++
 tests/nightly/test_distributed_training-gpu.sh |  11 +-
 tools/launch.py|  63 +++---
 9 files changed, 429 insertions(+), 139 deletions(-)
 copy example/distributed_training/{cifar10_dist.py => cifar10_kvstore_hvd.py} 
(52%)
 create mode 100644 python/mxnet/kvstore/horovod.py
 create mode 100644 tests/nightly/dist_device_sync_kvstore_horovod.py



[incubator-mxnet] branch master updated (fb73a17 -> e796ae9)

2020-04-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fb73a17  Switch to C++17 and modernize toolchain + CI (#17984)
 add e796ae9  Integrate Horovod training API as part of MXNet native 
distributed training API (#17531)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   5 +-
 .../{cifar10_dist.py => cifar10_kvstore_hvd.py}| 243 -
 python/mxnet/gluon/trainer.py  |   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/horovod.py| 161 ++
 python/mxnet/kvstore/kvstore.py|   3 +
 tests/nightly/dist_device_sync_kvstore_horovod.py  |  80 +++
 tests/nightly/test_distributed_training-gpu.sh |  11 +-
 tools/launch.py|  63 +++---
 9 files changed, 429 insertions(+), 139 deletions(-)
 copy example/distributed_training/{cifar10_dist.py => cifar10_kvstore_hvd.py} 
(52%)
 create mode 100644 python/mxnet/kvstore/horovod.py
 create mode 100644 tests/nightly/dist_device_sync_kvstore_horovod.py



[incubator-mxnet] branch master updated (fb73a17 -> e796ae9)

2020-04-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fb73a17  Switch to C++17 and modernize toolchain + CI (#17984)
 add e796ae9  Integrate Horovod training API as part of MXNet native 
distributed training API (#17531)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   5 +-
 .../{cifar10_dist.py => cifar10_kvstore_hvd.py}| 243 -
 python/mxnet/gluon/trainer.py  |   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/horovod.py| 161 ++
 python/mxnet/kvstore/kvstore.py|   3 +
 tests/nightly/dist_device_sync_kvstore_horovod.py  |  80 +++
 tests/nightly/test_distributed_training-gpu.sh |  11 +-
 tools/launch.py|  63 +++---
 9 files changed, 429 insertions(+), 139 deletions(-)
 copy example/distributed_training/{cifar10_dist.py => cifar10_kvstore_hvd.py} 
(52%)
 create mode 100644 python/mxnet/kvstore/horovod.py
 create mode 100644 tests/nightly/dist_device_sync_kvstore_horovod.py



[incubator-mxnet] branch master updated (da95add -> 6cc990c)

2020-04-08 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from da95add  Fix vector access out of bound in MKLDNNConvolutionBackward 
(#17997)
 add 6cc990c  Revert "[MXNET-#16795] Byteps-KVStore: Intergrate Byteps into 
mxnet as new type of kvstore backend (#17555)" (#17998)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh   |  19 --
 ci/jenkins/Jenkins_steps.groovy  |  14 --
 ci/jenkins/Jenkinsfile_edge  |   2 +-
 ci/jenkins/Jenkinsfile_unix_gpu  |   1 -
 python/mxnet/kvstore/__init__.py |   1 -
 python/mxnet/kvstore/base.py |   9 +-
 python/mxnet/kvstore/byteps.py   | 255 ---
 tests/nightly/dist_device_sync_kvstore_byteps.py | 114 --
 tools/byteps_launcher.py | 195 -
 tools/launch.py  |  17 +-
 10 files changed, 3 insertions(+), 624 deletions(-)
 delete mode 100644 python/mxnet/kvstore/byteps.py
 delete mode 100644 tests/nightly/dist_device_sync_kvstore_byteps.py
 delete mode 100644 tools/byteps_launcher.py



[incubator-mxnet] branch master updated (da95add -> 6cc990c)

2020-04-08 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from da95add  Fix vector access out of bound in MKLDNNConvolutionBackward 
(#17997)
 add 6cc990c  Revert "[MXNET-#16795] Byteps-KVStore: Intergrate Byteps into 
mxnet as new type of kvstore backend (#17555)" (#17998)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh   |  19 --
 ci/jenkins/Jenkins_steps.groovy  |  14 --
 ci/jenkins/Jenkinsfile_edge  |   2 +-
 ci/jenkins/Jenkinsfile_unix_gpu  |   1 -
 python/mxnet/kvstore/__init__.py |   1 -
 python/mxnet/kvstore/base.py |   9 +-
 python/mxnet/kvstore/byteps.py   | 255 ---
 tests/nightly/dist_device_sync_kvstore_byteps.py | 114 --
 tools/byteps_launcher.py | 195 -
 tools/launch.py  |  17 +-
 10 files changed, 3 insertions(+), 624 deletions(-)
 delete mode 100644 python/mxnet/kvstore/byteps.py
 delete mode 100644 tests/nightly/dist_device_sync_kvstore_byteps.py
 delete mode 100644 tools/byteps_launcher.py



[incubator-mxnet] branch master updated (c244f9f -> 5adcbf8)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from c244f9f  [MXNET-#16795] Byteps-KVStore: Intergrate Byteps into mxnet 
as new type of kvstore backend (#17555)
 add 5adcbf8  GPU gemms true fp16 (#17466)

No new revisions were added by this update.

Summary of changes:
 docs/static_site/src/pages/api/faq/env_var.md |  4 ++
 src/operator/contrib/transformer.cu   | 30 +--
 src/operator/linalg_impl.h| 53 ++-
 tests/python/gpu/test_gluon_gpu.py| 21 +++
 4 files changed, 95 insertions(+), 13 deletions(-)



[incubator-mxnet] branch master updated (ff234db -> c244f9f)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from ff234db  Skip test_gluon_data.py on OSX (#17969)
 add c244f9f  [MXNET-#16795] Byteps-KVStore: Intergrate Byteps into mxnet 
as new type of kvstore backend (#17555)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |  19 ++
 ci/jenkins/Jenkins_steps.groovy|  14 ++
 ci/jenkins/Jenkinsfile_edge|   2 +-
 ci/jenkins/Jenkinsfile_unix_gpu|   1 +
 python/mxnet/kvstore/__init__.py   |   1 +
 python/mxnet/kvstore/base.py   |   9 +-
 python/mxnet/kvstore/byteps.py | 255 +
 ...ustom.py => dist_device_sync_kvstore_byteps.py} |  52 +++--
 tools/byteps_launcher.py   | 195 
 tools/launch.py|  17 +-
 10 files changed, 545 insertions(+), 20 deletions(-)
 create mode 100644 python/mxnet/kvstore/byteps.py
 copy tests/nightly/{dist_device_sync_kvstore_custom.py => 
dist_device_sync_kvstore_byteps.py} (58%)
 create mode 100644 tools/byteps_launcher.py



[incubator-mxnet] branch master updated (84b0ddd -> 03b8146)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)
 add 03b8146  Skip test_kvstore_gpu.test_rsp_push_pull (#17983)

No new revisions were added by this update.

Summary of changes:
 tests/python/gpu/test_kvstore_gpu.py | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (84b0ddd -> 03b8146)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)
 add 03b8146  Skip test_kvstore_gpu.test_rsp_push_pull (#17983)

No new revisions were added by this update.

Summary of changes:
 tests/python/gpu/test_kvstore_gpu.py | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (84b0ddd -> 03b8146)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)
 add 03b8146  Skip test_kvstore_gpu.test_rsp_push_pull (#17983)

No new revisions were added by this update.

Summary of changes:
 tests/python/gpu/test_kvstore_gpu.py | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (84b0ddd -> 03b8146)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)
 add 03b8146  Skip test_kvstore_gpu.test_rsp_push_pull (#17983)

No new revisions were added by this update.

Summary of changes:
 tests/python/gpu/test_kvstore_gpu.py | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (84b0ddd -> 03b8146)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)
 add 03b8146  Skip test_kvstore_gpu.test_rsp_push_pull (#17983)

No new revisions were added by this update.

Summary of changes:
 tests/python/gpu/test_kvstore_gpu.py | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (1b107a0 -> 84b0ddd)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 1b107a0  Remove redundant condition in np_matrix_op.cc (#17933)
 add 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh | 12 ++--
 .../nightly/test_distributed_training-gpu.sh   | 34 ++
 2 files changed, 25 insertions(+), 21 deletions(-)
 copy scala-package/examples/scripts/neuralstyle_end2end/run_test_end2end.sh => 
tests/nightly/test_distributed_training-gpu.sh (55%)
 mode change 100644 => 100755



[incubator-mxnet] branch master updated (1b107a0 -> 84b0ddd)

2020-04-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 1b107a0  Remove redundant condition in np_matrix_op.cc (#17933)
 add 84b0ddd  Add USE_DIST_KVSTORE=ON to GPU build (#17911)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh | 12 ++--
 .../nightly/test_distributed_training-gpu.sh   | 34 ++
 2 files changed, 25 insertions(+), 21 deletions(-)
 copy scala-package/examples/scripts/neuralstyle_end2end/run_test_end2end.sh => 
tests/nightly/test_distributed_training-gpu.sh (55%)
 mode change 100644 => 100755



[incubator-mxnet] branch master updated (66ee118 -> 792011e)

2020-04-03 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 66ee118  Fix Windows GPU CI (#17962)
 add 792011e  Omit kNullOp req when comparing changed NDArrays in 
static_shape=True (#17966)

No new revisions were added by this update.

Summary of changes:
 src/imperative/cached_op.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated: Use FP32 copy of weights for norm (multitensor LAMB optimizer) (#17700)

2020-03-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e39518  Use FP32 copy of weights for norm (multitensor LAMB 
optimizer) (#17700)
8e39518 is described below

commit 8e3951876b3598c8b52606a467add5f239d88b38
Author: MoisesHer <50716238+moises...@users.noreply.github.com>
AuthorDate: Mon Mar 23 09:55:24 2020 -0700

Use FP32 copy of weights for norm (multitensor LAMB optimizer) (#17700)

* Use fp32 copy of weights for computing norm in LAMB optimizer

* Fix cpplint
---
 src/operator/contrib/multi_lamb-inl.h | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/operator/contrib/multi_lamb-inl.h 
b/src/operator/contrib/multi_lamb-inl.h
index 7fb186f..256445a 100644
--- a/src/operator/contrib/multi_lamb-inl.h
+++ b/src/operator/contrib/multi_lamb-inl.h
@@ -282,10 +282,14 @@ inline void MultiLAMB(const nnvm::NodeAttrs& attrs,
 FillMultiLAMBKernelParam
 (attrs, ctx, inputs, outputs, &kernel_params);
 
-// create vector of TBlob with all the weights contiguous
-std::vector weights;
+// create vector of TBlob with all the weights contiguous to compute the 
norm
+// if mixed precision, use fp32 copy
+std::vector weights_for_norm;
+int position_weights = 0;
+if (!std::is_same::value)
+  position_weights = input_stride - 1;
 for (size_t index = 0; index < kernel_params.ntensors; ++index) {
-weights.emplace_back(inputs[index*input_stride]);
+  weights_for_norm.emplace_back(inputs[index * input_stride + 
position_weights]);
 }
 
 // Calculate amount of temporary storage (temp_g, r1, r2, block_to_tensor, 
block_to_chunk)
@@ -327,7 +331,7 @@ inline void MultiLAMB(const nnvm::NodeAttrs& attrs,
 Tensor 
block_to_chunk(reinterpret_cast(&workspace[pos_wspace]),
   Shape1(kernel_params.nchunks), s);
 
-MultiSumSqRun(weights, kernel_params.ntensors, r1.dptr_, ctx);
+MultiSumSqRun(weights_for_norm, kernel_params.ntensors, r1.dptr_, 
ctx);
 CallKernel1(s, kernel_params, param, temp_g.dptr_,
 block_to_tensor.dptr_,
 block_to_chunk.dptr_);



[incubator-mxnet] branch master updated (2f358fd -> b133899)

2020-03-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 2f358fd  [Numpy] Add op fmax, fmin, fmod (#17567)
 add b133899  Use multi-tensor sumSQ in clip_global_norm (#17652)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/utils.py| 24 
 tests/python/gpu/test_gluon_gpu.py | 14 +-
 2 files changed, 25 insertions(+), 13 deletions(-)



[incubator-mxnet] branch master updated (2f358fd -> b133899)

2020-03-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 2f358fd  [Numpy] Add op fmax, fmin, fmod (#17567)
 add b133899  Use multi-tensor sumSQ in clip_global_norm (#17652)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/utils.py| 24 
 tests/python/gpu/test_gluon_gpu.py | 14 +-
 2 files changed, 25 insertions(+), 13 deletions(-)



[incubator-mxnet] branch master updated (91c4516 -> 9993738)

2020-02-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 91c4516  [numpy] Add np.random.pareto and np.random.power (#17517)
 add 9993738  Partitioning Gluon HybridBlocks (#15969)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/block.py   |  80 +--
 tests/python/unittest/test_subgraph_op.py | 789 +-
 2 files changed, 508 insertions(+), 361 deletions(-)



[incubator-mxnet] branch master updated (91c4516 -> 9993738)

2020-02-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 91c4516  [numpy] Add np.random.pareto and np.random.power (#17517)
 add 9993738  Partitioning Gluon HybridBlocks (#15969)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/block.py   |  80 +--
 tests/python/unittest/test_subgraph_op.py | 789 +-
 2 files changed, 508 insertions(+), 361 deletions(-)



[incubator-mxnet] branch master updated (b1e4911 -> 65aab9e)

2020-02-01 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from b1e4911  Multithreaded Inference Support (#16654)
 add 65aab9e  Add p3 KVStore (#15124)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh   |   1 +
 docs/static_site/src/pages/api/faq/env_var.md|   5 +
 include/mxnet/c_api.h|  42 
 include/mxnet/kvstore.h  |  28 +++
 python/mxnet/error.py|   1 +
 python/mxnet/gluon/trainer.py|   2 +-
 python/mxnet/kvstore/base.py |   4 +-
 python/mxnet/kvstore/kvstore.py  |  33 ++-
 python/mxnet/model.py|  21 +-
 src/c_api/c_api.cc   |  52 +
 src/kvstore/kvstore.cc   |   9 +-
 src/kvstore/kvstore_dist.h   | 170 ---
 src/kvstore/kvstore_local.h  |  40 
 src/kvstore/p3store_dist.h   | 256 +++
 tests/nightly/dist_device_sync_kvstore.py|  12 +-
 tests/nightly/dist_device_sync_kvstore_custom.py |   2 +-
 tools/launch.py  |   4 +
 17 files changed, 571 insertions(+), 111 deletions(-)
 create mode 100644 src/kvstore/p3store_dist.h



[incubator-mxnet] branch master updated (b1e4911 -> 65aab9e)

2020-02-01 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from b1e4911  Multithreaded Inference Support (#16654)
 add 65aab9e  Add p3 KVStore (#15124)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh   |   1 +
 docs/static_site/src/pages/api/faq/env_var.md|   5 +
 include/mxnet/c_api.h|  42 
 include/mxnet/kvstore.h  |  28 +++
 python/mxnet/error.py|   1 +
 python/mxnet/gluon/trainer.py|   2 +-
 python/mxnet/kvstore/base.py |   4 +-
 python/mxnet/kvstore/kvstore.py  |  33 ++-
 python/mxnet/model.py|  21 +-
 src/c_api/c_api.cc   |  52 +
 src/kvstore/kvstore.cc   |   9 +-
 src/kvstore/kvstore_dist.h   | 170 ---
 src/kvstore/kvstore_local.h  |  40 
 src/kvstore/p3store_dist.h   | 256 +++
 tests/nightly/dist_device_sync_kvstore.py|  12 +-
 tests/nightly/dist_device_sync_kvstore_custom.py |   2 +-
 tools/launch.py  |   4 +
 17 files changed, 571 insertions(+), 111 deletions(-)
 create mode 100644 src/kvstore/p3store_dist.h



[incubator-mxnet] branch master updated (a1b0ff2 -> 3ef8935)

2020-01-27 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1b0ff2  update mkl to 2020.0 (#17355)
 add 3ef8935  [LICENSE] fix cpp predcit license (#17377)

No new revisions were added by this update.

Summary of changes:
 .../predict-cpp/image-classification-predict.cc| 23 +-
 .../nightly/apache_rat_license_check/rat-excludes  |  3 ++-
 tools/license_header.py|  3 +++
 3 files changed, 14 insertions(+), 15 deletions(-)



[incubator-mxnet] branch master updated (a1b0ff2 -> 3ef8935)

2020-01-27 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a1b0ff2  update mkl to 2020.0 (#17355)
 add 3ef8935  [LICENSE] fix cpp predcit license (#17377)

No new revisions were added by this update.

Summary of changes:
 .../predict-cpp/image-classification-predict.cc| 23 +-
 .../nightly/apache_rat_license_check/rat-excludes  |  3 ++-
 tools/license_header.py|  3 +++
 3 files changed, 14 insertions(+), 15 deletions(-)



[incubator-mxnet] 01/02: [BUGFIX] fix model zoo parallel download (#17372)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit 69f4f3161364f290be053ecbd48931a40bd7ab68
Author: Haibin Lin 
AuthorDate: Thu Jan 23 19:32:21 2020 -0800

[BUGFIX] fix model zoo parallel download (#17372)

* use temp file

* fix dependency

* Update model_store.py

* Update test_gluon_model_zoo.py

* remove NamedTempFile
---
 python/mxnet/gluon/model_zoo/model_store.py   | 22 +---
 python/mxnet/gluon/utils.py   | 30 +++
 tests/python/unittest/test_gluon_model_zoo.py | 16 ++
 3 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/python/mxnet/gluon/model_zoo/model_store.py 
b/python/mxnet/gluon/model_zoo/model_store.py
index 11ac47b..6da7dd1 100644
--- a/python/mxnet/gluon/model_zoo/model_store.py
+++ b/python/mxnet/gluon/model_zoo/model_store.py
@@ -22,8 +22,11 @@ __all__ = ['get_model_file', 'purge']
 import os
 import zipfile
 import logging
+import tempfile
+import uuid
+import shutil
 
-from ..utils import download, check_sha1
+from ..utils import download, check_sha1, replace_file
 from ... import base, util
 
 _model_sha1 = {name: checksum for checksum, name in [
@@ -103,16 +106,21 @@ def get_model_file(name, 
root=os.path.join(base.data_dir(), 'models')):
 
 util.makedirs(root)
 
-zip_file_path = os.path.join(root, file_name+'.zip')
 repo_url = os.environ.get('MXNET_GLUON_REPO', apache_repo_url)
 if repo_url[-1] != '/':
 repo_url = repo_url + '/'
+
+random_uuid = str(uuid.uuid4())
+temp_zip_file_path = os.path.join(root, file_name+'.zip'+random_uuid)
 download(_url_format.format(repo_url=repo_url, file_name=file_name),
- path=zip_file_path,
- overwrite=True)
-with zipfile.ZipFile(zip_file_path) as zf:
-zf.extractall(root)
-os.remove(zip_file_path)
+ path=temp_zip_file_path, overwrite=True)
+with zipfile.ZipFile(temp_zip_file_path) as zf:
+temp_dir = tempfile.mkdtemp(dir=root)
+zf.extractall(temp_dir)
+temp_file_path = os.path.join(temp_dir, file_name+'.params')
+replace_file(temp_file_path, file_path)
+shutil.rmtree(temp_dir)
+os.remove(temp_zip_file_path)
 
 if check_sha1(file_path, sha1_hash):
 return file_path
diff --git a/python/mxnet/gluon/utils.py b/python/mxnet/gluon/utils.py
index 81a8dba..63e11ea 100644
--- a/python/mxnet/gluon/utils.py
+++ b/python/mxnet/gluon/utils.py
@@ -21,7 +21,7 @@
 from __future__ import absolute_import
 
 __all__ = ['split_data', 'split_and_load', 'clip_global_norm',
-   'check_sha1', 'download']
+   'check_sha1', 'download', 'replace_file']
 
 import os
 import sys
@@ -35,7 +35,7 @@ import requests
 import numpy as np
 
 from .. import ndarray
-from ..util import is_np_shape, is_np_array
+from ..util import is_np_shape, is_np_array, makedirs
 from .. import numpy as _mx_np  # pylint: disable=reimported
 
 
@@ -209,8 +209,14 @@ def check_sha1(filename, sha1_hash):
 
 if not sys.platform.startswith('win32'):
 # refer to https://github.com/untitaker/python-atomicwrites
-def _replace_atomic(src, dst):
-"""Implement atomic os.replace with linux and OSX. Internal use only"""
+def replace_file(src, dst):
+"""Implement atomic os.replace with linux and OSX.
+
+Parameters
+--
+src : source file path
+dst : destination file path
+"""
 try:
 os.rename(src, dst)
 except OSError:
@@ -252,11 +258,17 @@ else:
 finally:
 raise OSError(msg)
 
-def _replace_atomic(src, dst):
+def replace_file(src, dst):
 """Implement atomic os.replace with windows.
+
 refer to 
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-movefileexw
 The function fails when one of the process(copy, flush, delete) fails.
-Internal use only"""
+
+Parameters
+--
+src : source file path
+dst : destination file path
+"""
 _handle_errors(ctypes.windll.kernel32.MoveFileExW(
 _str_to_unicode(src), _str_to_unicode(dst),
 _windows_default_flags | _MOVEFILE_REPLACE_EXISTING
@@ -264,7 +276,7 @@ else:
 
 
 def download(url, path=None, overwrite=False, sha1_hash=None, retries=5, 
verify_ssl=True):
-"""Download an given URL
+"""Download a given URL
 
 Parameters
 --
@@ -310,7 +322,7 @@ def download(url, pa

[incubator-mxnet] 02/02: Update symbol.py (#17408)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit 2c61787c78a51ea87d46c4820746a33b70fca64c
Author: Haibin Lin 
AuthorDate: Thu Jan 23 15:36:47 2020 -0800

Update symbol.py (#17408)
---
 python/mxnet/contrib/amp/lists/symbol.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/mxnet/contrib/amp/lists/symbol.py 
b/python/mxnet/contrib/amp/lists/symbol.py
index 2146853..d501a7d 100644
--- a/python/mxnet/contrib/amp/lists/symbol.py
+++ b/python/mxnet/contrib/amp/lists/symbol.py
@@ -591,6 +591,10 @@ WIDEST_TYPE_CASTS = [
 '_contrib_dgl_graph_compact',
 '_contrib_dgl_subgraph',
 '_contrib_edge_id',
+'_contrib_interleaved_matmul_encdec_qk',
+'_contrib_interleaved_matmul_encdec_valatt',
+'_contrib_interleaved_matmul_selfatt_qk',
+'_contrib_interleaved_matmul_selfatt_valatt',
 'where',
 '_sparse_where',
 '_sparse_broadcast_add',



[incubator-mxnet] branch v1.6.x updated (1cb738a -> 2c61787)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 1cb738a  Update ps-lite LICENSE (#17351) (#17370)
 new 69f4f31  [BUGFIX] fix model zoo parallel download (#17372)
 new 2c61787  Update symbol.py (#17408)

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 python/mxnet/contrib/amp/lists/symbol.py  |  4 
 python/mxnet/gluon/model_zoo/model_store.py   | 22 +---
 python/mxnet/gluon/utils.py   | 30 +++
 tests/python/unittest/test_gluon_model_zoo.py | 16 ++
 4 files changed, 56 insertions(+), 16 deletions(-)



[incubator-mxnet] 02/02: Update symbol.py (#17408)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit 2c61787c78a51ea87d46c4820746a33b70fca64c
Author: Haibin Lin 
AuthorDate: Thu Jan 23 15:36:47 2020 -0800

Update symbol.py (#17408)
---
 python/mxnet/contrib/amp/lists/symbol.py | 4 
 1 file changed, 4 insertions(+)

diff --git a/python/mxnet/contrib/amp/lists/symbol.py 
b/python/mxnet/contrib/amp/lists/symbol.py
index 2146853..d501a7d 100644
--- a/python/mxnet/contrib/amp/lists/symbol.py
+++ b/python/mxnet/contrib/amp/lists/symbol.py
@@ -591,6 +591,10 @@ WIDEST_TYPE_CASTS = [
 '_contrib_dgl_graph_compact',
 '_contrib_dgl_subgraph',
 '_contrib_edge_id',
+'_contrib_interleaved_matmul_encdec_qk',
+'_contrib_interleaved_matmul_encdec_valatt',
+'_contrib_interleaved_matmul_selfatt_qk',
+'_contrib_interleaved_matmul_selfatt_valatt',
 'where',
 '_sparse_where',
 '_sparse_broadcast_add',



[incubator-mxnet] 01/02: [BUGFIX] fix model zoo parallel download (#17372)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git

commit 69f4f3161364f290be053ecbd48931a40bd7ab68
Author: Haibin Lin 
AuthorDate: Thu Jan 23 19:32:21 2020 -0800

[BUGFIX] fix model zoo parallel download (#17372)

* use temp file

* fix dependency

* Update model_store.py

* Update test_gluon_model_zoo.py

* remove NamedTempFile
---
 python/mxnet/gluon/model_zoo/model_store.py   | 22 +---
 python/mxnet/gluon/utils.py   | 30 +++
 tests/python/unittest/test_gluon_model_zoo.py | 16 ++
 3 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/python/mxnet/gluon/model_zoo/model_store.py 
b/python/mxnet/gluon/model_zoo/model_store.py
index 11ac47b..6da7dd1 100644
--- a/python/mxnet/gluon/model_zoo/model_store.py
+++ b/python/mxnet/gluon/model_zoo/model_store.py
@@ -22,8 +22,11 @@ __all__ = ['get_model_file', 'purge']
 import os
 import zipfile
 import logging
+import tempfile
+import uuid
+import shutil
 
-from ..utils import download, check_sha1
+from ..utils import download, check_sha1, replace_file
 from ... import base, util
 
 _model_sha1 = {name: checksum for checksum, name in [
@@ -103,16 +106,21 @@ def get_model_file(name, 
root=os.path.join(base.data_dir(), 'models')):
 
 util.makedirs(root)
 
-zip_file_path = os.path.join(root, file_name+'.zip')
 repo_url = os.environ.get('MXNET_GLUON_REPO', apache_repo_url)
 if repo_url[-1] != '/':
 repo_url = repo_url + '/'
+
+random_uuid = str(uuid.uuid4())
+temp_zip_file_path = os.path.join(root, file_name+'.zip'+random_uuid)
 download(_url_format.format(repo_url=repo_url, file_name=file_name),
- path=zip_file_path,
- overwrite=True)
-with zipfile.ZipFile(zip_file_path) as zf:
-zf.extractall(root)
-os.remove(zip_file_path)
+ path=temp_zip_file_path, overwrite=True)
+with zipfile.ZipFile(temp_zip_file_path) as zf:
+temp_dir = tempfile.mkdtemp(dir=root)
+zf.extractall(temp_dir)
+temp_file_path = os.path.join(temp_dir, file_name+'.params')
+replace_file(temp_file_path, file_path)
+shutil.rmtree(temp_dir)
+os.remove(temp_zip_file_path)
 
 if check_sha1(file_path, sha1_hash):
 return file_path
diff --git a/python/mxnet/gluon/utils.py b/python/mxnet/gluon/utils.py
index 81a8dba..63e11ea 100644
--- a/python/mxnet/gluon/utils.py
+++ b/python/mxnet/gluon/utils.py
@@ -21,7 +21,7 @@
 from __future__ import absolute_import
 
 __all__ = ['split_data', 'split_and_load', 'clip_global_norm',
-   'check_sha1', 'download']
+   'check_sha1', 'download', 'replace_file']
 
 import os
 import sys
@@ -35,7 +35,7 @@ import requests
 import numpy as np
 
 from .. import ndarray
-from ..util import is_np_shape, is_np_array
+from ..util import is_np_shape, is_np_array, makedirs
 from .. import numpy as _mx_np  # pylint: disable=reimported
 
 
@@ -209,8 +209,14 @@ def check_sha1(filename, sha1_hash):
 
 if not sys.platform.startswith('win32'):
 # refer to https://github.com/untitaker/python-atomicwrites
-def _replace_atomic(src, dst):
-"""Implement atomic os.replace with linux and OSX. Internal use only"""
+def replace_file(src, dst):
+"""Implement atomic os.replace with linux and OSX.
+
+Parameters
+--
+src : source file path
+dst : destination file path
+"""
 try:
 os.rename(src, dst)
 except OSError:
@@ -252,11 +258,17 @@ else:
 finally:
 raise OSError(msg)
 
-def _replace_atomic(src, dst):
+def replace_file(src, dst):
 """Implement atomic os.replace with windows.
+
 refer to 
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-movefileexw
 The function fails when one of the process(copy, flush, delete) fails.
-Internal use only"""
+
+Parameters
+--
+src : source file path
+dst : destination file path
+"""
 _handle_errors(ctypes.windll.kernel32.MoveFileExW(
 _str_to_unicode(src), _str_to_unicode(dst),
 _windows_default_flags | _MOVEFILE_REPLACE_EXISTING
@@ -264,7 +276,7 @@ else:
 
 
 def download(url, path=None, overwrite=False, sha1_hash=None, retries=5, 
verify_ssl=True):
-"""Download an given URL
+"""Download a given URL
 
 Parameters
 --
@@ -310,7 +322,7 @@ def download(url, pa

[incubator-mxnet] branch v1.6.x updated (1cb738a -> 2c61787)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 1cb738a  Update ps-lite LICENSE (#17351) (#17370)
 new 69f4f31  [BUGFIX] fix model zoo parallel download (#17372)
 new 2c61787  Update symbol.py (#17408)

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 python/mxnet/contrib/amp/lists/symbol.py  |  4 
 python/mxnet/gluon/model_zoo/model_store.py   | 22 +---
 python/mxnet/gluon/utils.py   | 30 +++
 tests/python/unittest/test_gluon_model_zoo.py | 16 ++
 4 files changed, 56 insertions(+), 16 deletions(-)



[incubator-mxnet] branch master updated (e1435a3 -> e1779f4)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from e1435a3  [NumPy] Add NumPy support for norm (#17014)
 add e1779f4  [BUILD] pslite fix link zmq (#17427)

No new revisions were added by this update.

Summary of changes:
 CMakeLists.txt | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)



[incubator-mxnet] branch master updated (e1435a3 -> e1779f4)

2020-01-24 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from e1435a3  [NumPy] Add NumPy support for norm (#17014)
 add e1779f4  [BUILD] pslite fix link zmq (#17427)

No new revisions were added by this update.

Summary of changes:
 CMakeLists.txt | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)



[incubator-mxnet] branch master updated (5e64e96 -> 6bab3c4)

2020-01-17 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 5e64e96  adding docs for 64bit C APIs of large tensor (#17309)
 add 6bab3c4  Update ps-lite LICENSE (#17351)

No new revisions were added by this update.

Summary of changes:
 LICENSE | 1 +
 1 file changed, 1 insertion(+)



[incubator-mxnet] branch master updated (7b349dd -> 6b9a1da)

2020-01-15 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7b349dd  grouping large array tests based on type and updating nightly 
CI function (#17305)
 add 6b9a1da  Multi-tensor LAMB (#16893)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/contrib.py |  76 +++
 python/mxnet/optimizer/optimizer.py | 121 ---
 python/mxnet/test_utils.py  |  77 ---
 src/operator/contrib/multi_lamb-inl.h   | 359 
 src/operator/contrib/multi_lamb.cc  | 251 ++
 src/operator/contrib/multi_lamb.cu  | 261 +++
 src/operator/contrib/multi_sum_sq-inl.h |   4 +
 src/operator/contrib/multi_sum_sq.cc|   6 +
 src/operator/contrib/multi_sum_sq.cu|  29 ++-
 tests/python/unittest/test_optimizer.py |  46 +++-
 10 files changed, 1157 insertions(+), 73 deletions(-)
 mode change 100644 => 100755 python/mxnet/optimizer/optimizer.py
 mode change 100644 => 100755 python/mxnet/test_utils.py
 create mode 100644 src/operator/contrib/multi_lamb-inl.h
 create mode 100644 src/operator/contrib/multi_lamb.cc
 create mode 100644 src/operator/contrib/multi_lamb.cu
 mode change 100644 => 100755 tests/python/unittest/test_optimizer.py



[incubator-mxnet] branch master updated (7b349dd -> 6b9a1da)

2020-01-15 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 7b349dd  grouping large array tests based on type and updating nightly 
CI function (#17305)
 add 6b9a1da  Multi-tensor LAMB (#16893)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/contrib.py |  76 +++
 python/mxnet/optimizer/optimizer.py | 121 ---
 python/mxnet/test_utils.py  |  77 ---
 src/operator/contrib/multi_lamb-inl.h   | 359 
 src/operator/contrib/multi_lamb.cc  | 251 ++
 src/operator/contrib/multi_lamb.cu  | 261 +++
 src/operator/contrib/multi_sum_sq-inl.h |   4 +
 src/operator/contrib/multi_sum_sq.cc|   6 +
 src/operator/contrib/multi_sum_sq.cu|  29 ++-
 tests/python/unittest/test_optimizer.py |  46 +++-
 10 files changed, 1157 insertions(+), 73 deletions(-)
 mode change 100644 => 100755 python/mxnet/optimizer/optimizer.py
 mode change 100644 => 100755 python/mxnet/test_utils.py
 create mode 100644 src/operator/contrib/multi_lamb-inl.h
 create mode 100644 src/operator/contrib/multi_lamb.cc
 create mode 100644 src/operator/contrib/multi_lamb.cu
 mode change 100644 => 100755 tests/python/unittest/test_optimizer.py



[incubator-mxnet] branch master updated (058de55 -> 3971938)

2020-01-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 058de55  Fix image display in python autograd tutorial (#17243)
 add 3971938  Fix #17267, add expected and got datatype for concat error 
msgs (#17271)

No new revisions were added by this update.

Summary of changes:
 src/operator/nn/concat.cc | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)



[incubator-mxnet] branch master updated: fix typo (#17277)

2020-01-11 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 2002d60  fix typo (#17277)
2002d60 is described below

commit 2002d6065d5668f9de4d4050c8cf750dd6cc5ca8
Author: Chaitanya Prakash Bapat 
AuthorDate: Sat Jan 11 21:25:07 2020 -0800

fix typo (#17277)
---
 cpp-package/include/mxnet-cpp/ndarray.h | 8 
 include/mxnet/c_api.h   | 4 ++--
 include/mxnet/ndarray.h | 4 ++--
 perl-package/AI-MXNetCAPI/mxnet.i   | 4 ++--
 src/ndarray/ndarray_function.cu | 2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/cpp-package/include/mxnet-cpp/ndarray.h 
b/cpp-package/include/mxnet-cpp/ndarray.h
index c4d51c5..0a9a412 100644
--- a/cpp-package/include/mxnet-cpp/ndarray.h
+++ b/cpp-package/include/mxnet-cpp/ndarray.h
@@ -251,7 +251,7 @@ class NDArray {
   NDArray &operator%=(const NDArray &src);
   NDArray ArgmaxChannel();
   /*!
-  * \brief Do a synchronize copy from a continugous CPU memory region.
+  * \brief Do a synchronize copy from a contiguous CPU memory region.
   *
   *  This function will call WaitToWrite before the copy is performed.
   *  This is useful to copy data from existing memory region that are
@@ -262,7 +262,7 @@ class NDArray {
   */
   void SyncCopyFromCPU(const mx_float *data, size_t size);
   /*!
-  * \brief Do a synchronize copy from a continugous CPU memory region.
+  * \brief Do a synchronize copy from a contiguous CPU memory region.
   *
   *  This function will call WaitToWrite before the copy is performed.
   *  This is useful to copy data from existing memory region that are
@@ -272,7 +272,7 @@ class NDArray {
   */
   void SyncCopyFromCPU(const std::vector &data);
   /*!
-  * \brief Do a synchronize copy to a continugous CPU memory region.
+  * \brief Do a synchronize copy to a contiguous CPU memory region.
   *
   *  This function will call WaitToRead before the copy is performed.
   *  This is useful to copy data from existing memory region that are
@@ -283,7 +283,7 @@ class NDArray {
   */
   void SyncCopyToCPU(mx_float *data, size_t size = 0);
   /*!
-  * \brief Do a synchronize copy to a continugous CPU memory region.
+  * \brief Do a synchronize copy to a contiguous CPU memory region.
   *
   *  This function will call WaitToRead before the copy is performed.
   *  This is useful to copy data from existing memory region that are
diff --git a/include/mxnet/c_api.h b/include/mxnet/c_api.h
index 27b420f..33d79bd 100644
--- a/include/mxnet/c_api.h
+++ b/include/mxnet/c_api.h
@@ -723,7 +723,7 @@ MXNET_DLL int MXNDArrayLoadFromBuffer(const void 
*ndarray_buffer,
   const char*** out_names);
 
 /*!
- * \brief Perform a synchronize copy from a continugous CPU memory region.
+ * \brief Perform a synchronize copy from a contiguous CPU memory region.
  *
  *  This function will call WaitToWrite before the copy is performed.
  *  This is useful to copy data from existing memory region that are
@@ -737,7 +737,7 @@ MXNET_DLL int MXNDArraySyncCopyFromCPU(NDArrayHandle handle,
const void *data,
size_t size);
 /*!
- * \brief Perform a synchronize copyto a continugous CPU memory region.
+ * \brief Perform a synchronize copyto a contiguous CPU memory region.
  *
  *  This function will call WaitToRead before the copy is performed.
  *  This is useful to copy data from existing memory region that are
diff --git a/include/mxnet/ndarray.h b/include/mxnet/ndarray.h
index 1b0b119..3e780a1 100644
--- a/include/mxnet/ndarray.h
+++ b/include/mxnet/ndarray.h
@@ -483,7 +483,7 @@ class NDArray {
*/
   NDArray Copy(Context ctx) const;
   /*!
-   * \brief Do a synchronize copy from a continugous CPU memory region.
+   * \brief Do a synchronize copy from a contiguous CPU memory region.
*
*  This function will call WaitToWrite before the copy is performed.
*  This is useful to copy data from existing memory region that are
@@ -500,7 +500,7 @@ class NDArray {
   void SyncCopyFromNDArray(const NDArray &src, int i = -1, int j = -1);
 
   /*!
-   * \brief Do a synchronize copy to a continugous CPU memory region.
+   * \brief Do a synchronize copy to a contiguous CPU memory region.
*
*  This function will call WaitToRead before the copy is performed.
*  This is useful to copy data from existing memory region that are
diff --git a/perl-package/AI-MXNetCAPI/mxnet.i 
b/perl-package/AI-MXNetCAPI/mxnet.i
index e38402c..f35f620 100644
--- a/perl-package/AI-MXNetCAPI/mxnet.i
+++ b/perl-package/AI-MXNetCAPI/mxnet.i
@@ -510,7 +510,7 @@ int MXNDArrayLoadFromBuffer(const void *in,
 const char*** out_array);
 
 /*!
- * \brief Perform a synchronize copy from a continugous C

[incubator-mxnet] branch master updated (f88b1ed -> c3b0baa)

2020-01-10 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f88b1ed  Add contributors (#17268)
 add c3b0baa  fix lstm layer with projection save params (#17266)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/rnn/rnn_layer.py | 2 +-
 tests/python/gpu/test_gluon_gpu.py  | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (f88b1ed -> c3b0baa)

2020-01-10 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f88b1ed  Add contributors (#17268)
 add c3b0baa  fix lstm layer with projection save params (#17266)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/rnn/rnn_layer.py | 2 +-
 tests/python/gpu/test_gluon_gpu.py  | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (6ba9aad -> ac88f1e)

2020-01-09 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 6ba9aad  Enabling large tensor support for binary broadcast operators 
(#16755)
 add ac88f1e  [DOC] Add a few tips for running horovod (#17235)

No new revisions were added by this update.

Summary of changes:
 docs/static_site/src/pages/api/faq/perf.md | 23 +--
 example/distributed_training-horovod/README.md |  8 
 2 files changed, 21 insertions(+), 10 deletions(-)



[incubator-mxnet] branch master updated (8e946c9 -> 55e222b)

2020-01-03 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 8e946c9  Implement atleast_1d/2d/3d (#17099)
 add 55e222b  Interleaved MHA for CPU path (#17138)

No new revisions were added by this update.

Summary of changes:
 src/operator/contrib/transformer.cc| 549 -
 tests/python/gpu/test_operator_gpu.py  | 317 ---
 tests/python/unittest/test_operator.py | 324 +++
 3 files changed, 861 insertions(+), 329 deletions(-)



[incubator-mxnet] branch master updated (8e946c9 -> 55e222b)

2020-01-03 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 8e946c9  Implement atleast_1d/2d/3d (#17099)
 add 55e222b  Interleaved MHA for CPU path (#17138)

No new revisions were added by this update.

Summary of changes:
 src/operator/contrib/transformer.cc| 549 -
 tests/python/gpu/test_operator_gpu.py  | 317 ---
 tests/python/unittest/test_operator.py | 324 +++
 3 files changed, 861 insertions(+), 329 deletions(-)



[incubator-mxnet] branch v1.6.x updated: fix norm sparse fallback (#17149)

2020-01-02 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/v1.6.x by this push:
 new dafbb11  fix norm sparse fallback (#17149)
dafbb11 is described below

commit dafbb1107a4dd34950b5f5a4d513ddf51f7c07a8
Author: Hao Jin 
AuthorDate: Thu Dec 26 07:07:30 2019 +0800

fix norm sparse fallback (#17149)
---
 src/operator/tensor/broadcast_reduce_norm_value.cc | 2 +-
 src/operator/tensor/broadcast_reduce_norm_value.cu | 2 +-
 src/operator/tensor/broadcast_reduce_op.h  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/operator/tensor/broadcast_reduce_norm_value.cc 
b/src/operator/tensor/broadcast_reduce_norm_value.cc
index 4cd92d4..9acc157 100644
--- a/src/operator/tensor/broadcast_reduce_norm_value.cc
+++ b/src/operator/tensor/broadcast_reduce_norm_value.cc
@@ -40,7 +40,7 @@ void L2NormComputeEx(const nnvm::NodeAttrs& attrs,
   const NormParam& param = nnvm::get(attrs.parsed);
   mshadow::Stream* s = ctx.get_stream();
   const NDArrayStorageType istype = inputs[0].storage_type();
-  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
   if ((istype == kRowSparseStorage || istype == kCSRStorage) && axis.ndim() == 
0 &&
param.ord == 2) {
 // l2 norm on the entire array
diff --git a/src/operator/tensor/broadcast_reduce_norm_value.cu 
b/src/operator/tensor/broadcast_reduce_norm_value.cu
index 188c93e..735c3d7 100644
--- a/src/operator/tensor/broadcast_reduce_norm_value.cu
+++ b/src/operator/tensor/broadcast_reduce_norm_value.cu
@@ -39,7 +39,7 @@ void L2NormComputeEx(const nnvm::NodeAttrs& attrs,
   const NormParam& param = nnvm::get(attrs.parsed);
   mshadow::Stream* s = ctx.get_stream();
   const NDArrayStorageType istype = inputs[0].storage_type();
-  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
   if ((istype == kRowSparseStorage || istype == kCSRStorage) && axis.ndim() == 
0 &&
param.ord == 2) {
 // l2 norm on the entire array
diff --git a/src/operator/tensor/broadcast_reduce_op.h 
b/src/operator/tensor/broadcast_reduce_op.h
index 27e2249..799f865 100644
--- a/src/operator/tensor/broadcast_reduce_op.h
+++ b/src/operator/tensor/broadcast_reduce_op.h
@@ -1152,7 +1152,7 @@ inline bool LpNormStorageType(const nnvm::NodeAttrs& 
attrs,
  DispatchMode::kFCompute);
   }
   if (param.ord == 2) {
-const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
 if (!dispatched && (in_stype == kRowSparseStorage || in_stype == 
kCSRStorage) &&
 axis.ndim() == 0 && param.ord == 2) {
   // l2 norm: rsp/csr, axis = () -> dns



[incubator-mxnet] branch master updated: fix norm sparse fallback (#17149)

2019-12-25 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 2551a9d  fix norm sparse fallback (#17149)
2551a9d is described below

commit 2551a9d8c8a4f5fd73c98e56ff79ab5410053d0e
Author: Hao Jin 
AuthorDate: Thu Dec 26 07:07:30 2019 +0800

fix norm sparse fallback (#17149)
---
 src/operator/tensor/broadcast_reduce_norm_value.cc | 2 +-
 src/operator/tensor/broadcast_reduce_norm_value.cu | 2 +-
 src/operator/tensor/broadcast_reduce_op.h  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/operator/tensor/broadcast_reduce_norm_value.cc 
b/src/operator/tensor/broadcast_reduce_norm_value.cc
index 4cd92d4..9acc157 100644
--- a/src/operator/tensor/broadcast_reduce_norm_value.cc
+++ b/src/operator/tensor/broadcast_reduce_norm_value.cc
@@ -40,7 +40,7 @@ void L2NormComputeEx(const nnvm::NodeAttrs& attrs,
   const NormParam& param = nnvm::get(attrs.parsed);
   mshadow::Stream* s = ctx.get_stream();
   const NDArrayStorageType istype = inputs[0].storage_type();
-  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
   if ((istype == kRowSparseStorage || istype == kCSRStorage) && axis.ndim() == 
0 &&
param.ord == 2) {
 // l2 norm on the entire array
diff --git a/src/operator/tensor/broadcast_reduce_norm_value.cu 
b/src/operator/tensor/broadcast_reduce_norm_value.cu
index 188c93e..735c3d7 100644
--- a/src/operator/tensor/broadcast_reduce_norm_value.cu
+++ b/src/operator/tensor/broadcast_reduce_norm_value.cu
@@ -39,7 +39,7 @@ void L2NormComputeEx(const nnvm::NodeAttrs& attrs,
   const NormParam& param = nnvm::get(attrs.parsed);
   mshadow::Stream* s = ctx.get_stream();
   const NDArrayStorageType istype = inputs[0].storage_type();
-  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+  const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
   if ((istype == kRowSparseStorage || istype == kCSRStorage) && axis.ndim() == 
0 &&
param.ord == 2) {
 // l2 norm on the entire array
diff --git a/src/operator/tensor/broadcast_reduce_op.h 
b/src/operator/tensor/broadcast_reduce_op.h
index 27e2249..799f865 100644
--- a/src/operator/tensor/broadcast_reduce_op.h
+++ b/src/operator/tensor/broadcast_reduce_op.h
@@ -1152,7 +1152,7 @@ inline bool LpNormStorageType(const nnvm::NodeAttrs& 
attrs,
  DispatchMode::kFCompute);
   }
   if (param.ord == 2) {
-const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape();
+const mxnet::TShape axis = param.axis.has_value() ? param.axis.value() : 
mxnet::TShape(0, -1);
 if (!dispatched && (in_stype == kRowSparseStorage || in_stype == 
kCSRStorage) &&
 axis.ndim() == 0 && param.ord == 2) {
   // l2 norm: rsp/csr, axis = () -> dns



[incubator-mxnet] branch master updated (814be59 -> f86a8d1)

2019-12-16 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 814be59  Add #include  needed for waitpid (#17078)
 add f86a8d1  [API] unified API for custom kvstores (#17010)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   1 +
 python/mxnet/__init__.py   |  24 +-
 python/mxnet/gluon/trainer.py  |  50 ++-
 .../{contrib/amp/lists => kvstore}/__init__.py |   9 +-
 python/mxnet/kvstore/base.py   | 455 +
 python/mxnet/{ => kvstore}/kvstore.py  | 183 -
 python/mxnet/{ => kvstore}/kvstore_server.py   |   6 +-
 python/mxnet/model.py  |  35 +-
 src/kvstore/kvstore_local.h|   4 +-
 tests/nightly/dist_device_sync_kvstore_custom.py   |  96 +
 tests/python/unittest/test_gluon_trainer.py|  33 +-
 tests/python/unittest/test_kvstore_custom.py   | 195 +
 12 files changed, 937 insertions(+), 154 deletions(-)
 copy python/mxnet/{contrib/amp/lists => kvstore}/__init__.py (84%)
 create mode 100644 python/mxnet/kvstore/base.py
 rename python/mxnet/{ => kvstore}/kvstore.py (85%)
 rename python/mxnet/{ => kvstore}/kvstore_server.py (97%)
 create mode 100644 tests/nightly/dist_device_sync_kvstore_custom.py
 create mode 100644 tests/python/unittest/test_kvstore_custom.py



[incubator-mxnet] branch master updated (814be59 -> f86a8d1)

2019-12-16 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 814be59  Add #include  needed for waitpid (#17078)
 add f86a8d1  [API] unified API for custom kvstores (#17010)

No new revisions were added by this update.

Summary of changes:
 ci/docker/runtime_functions.sh |   1 +
 python/mxnet/__init__.py   |  24 +-
 python/mxnet/gluon/trainer.py  |  50 ++-
 .../{contrib/amp/lists => kvstore}/__init__.py |   9 +-
 python/mxnet/kvstore/base.py   | 455 +
 python/mxnet/{ => kvstore}/kvstore.py  | 183 -
 python/mxnet/{ => kvstore}/kvstore_server.py   |   6 +-
 python/mxnet/model.py  |  35 +-
 src/kvstore/kvstore_local.h|   4 +-
 tests/nightly/dist_device_sync_kvstore_custom.py   |  96 +
 tests/python/unittest/test_gluon_trainer.py|  33 +-
 tests/python/unittest/test_kvstore_custom.py   | 195 +
 12 files changed, 937 insertions(+), 154 deletions(-)
 copy python/mxnet/{contrib/amp/lists => kvstore}/__init__.py (84%)
 create mode 100644 python/mxnet/kvstore/base.py
 rename python/mxnet/{ => kvstore}/kvstore.py (85%)
 rename python/mxnet/{ => kvstore}/kvstore_server.py (97%)
 create mode 100644 tests/nightly/dist_device_sync_kvstore_custom.py
 create mode 100644 tests/python/unittest/test_kvstore_custom.py



[incubator-mxnet] branch master updated (696c547 -> bbdc1c3)

2019-12-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 696c547  [BUGFIX] Fix trainer param order (#17068)
 add bbdc1c3  [reproducibility] multi_sum_sq review, AtomicAdd removal 
(#17002)

No new revisions were added by this update.

Summary of changes:
 src/operator/contrib/multi_sum_sq-inl.h |  10 ++-
 src/operator/contrib/multi_sum_sq.cc|  20 +++---
 src/operator/contrib/multi_sum_sq.cu| 110 
 tests/python/gpu/test_operator_gpu.py   |  30 +
 4 files changed, 118 insertions(+), 52 deletions(-)



[incubator-mxnet] branch master updated (042682e -> 696c547)

2019-12-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 042682e  [DOC] Fix tutorial link, and better error msg (#17057)
 add 696c547  [BUGFIX] Fix trainer param order (#17068)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/trainer.py   |  5 -
 tests/python/unittest/test_gluon_trainer.py | 16 
 2 files changed, 20 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (042682e -> 696c547)

2019-12-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 042682e  [DOC] Fix tutorial link, and better error msg (#17057)
 add 696c547  [BUGFIX] Fix trainer param order (#17068)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/trainer.py   |  5 -
 tests/python/unittest/test_gluon_trainer.py | 16 
 2 files changed, 20 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (f045018 -> 042682e)

2019-12-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f045018  [MXNET-978] Higher Order Gradient Support `logp1`, `expm1`, 
`square`. (#15416)
 add 042682e  [DOC] Fix tutorial link, and better error msg (#17057)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/parameter.py | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)



[incubator-mxnet] branch master updated (f045018 -> 042682e)

2019-12-14 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f045018  [MXNET-978] Higher Order Gradient Support `logp1`, `expm1`, 
`square`. (#15416)
 add 042682e  [DOC] Fix tutorial link, and better error msg (#17057)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/gluon/parameter.py | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)



[incubator-mxnet] branch master updated (f701f3f -> 61013a8)

2019-12-12 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f701f3f  [MXNET-1431] Multiple channel support in Gluon PReLU (#16262)
 add 61013a8  use env var to control stack trace logging (#17038)

No new revisions were added by this update.

Summary of changes:
 .github/ISSUE_TEMPLATE/bug_report.md  |  2 +-
 3rdparty/dmlc-core|  2 +-
 CMakeLists.txt|  3 ++-
 Makefile  |  2 ++
 ci/docker/runtime_functions.sh| 39 +++
 docs/static_site/src/pages/api/faq/env_var.md |  6 +
 6 files changed, 51 insertions(+), 3 deletions(-)



[incubator-mxnet] branch v1.6.x updated: [BUGFIX] Fix race condition in kvstore.pushpull (#17007) (#17052)

2019-12-11 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/v1.6.x by this push:
 new c675520  [BUGFIX] Fix race condition in kvstore.pushpull (#17007) 
(#17052)
c675520 is described below

commit c6755208f4f78d9f4ea095ec2ed8e067c8db1ef1
Author: Przemyslaw Tredak 
AuthorDate: Wed Dec 11 21:11:57 2019 -0800

[BUGFIX] Fix race condition in kvstore.pushpull (#17007) (#17052)

* add back gluon test

* fix typo

* change back gpu ctx

* also handle the case there some are pull and some are pushpull

* fix typo
---
 src/kvstore/kvstore_dist_server.h | 35 +--
 tests/nightly/dist_device_sync_kvstore.py | 35 +--
 2 files changed, 43 insertions(+), 27 deletions(-)

diff --git a/src/kvstore/kvstore_dist_server.h 
b/src/kvstore/kvstore_dist_server.h
index 65ded79..1dc222c 100644
--- a/src/kvstore/kvstore_dist_server.h
+++ b/src/kvstore/kvstore_dist_server.h
@@ -364,21 +364,34 @@ class KVStoreDistServer {
   if (log_verbose_)  {
 LOG(INFO) << "sent response to " << update_buf->request.size() << " 
workers";
   }
+  /**
+   * Request can be for either push, pull or pushpull
+   * If pull flag is set, respond immediately with the updated values
+   * Otherwise, only send the notification
+   */
+  bool has_pull = false;
   for (const auto& req : update_buf->request) {
-/**
- * Request can be for either push, pull or pushpull
- * If pull flag is set, respond immediately with the updated values
- * Otherwise, only send the notification
- */
-if (req.pull) {
-  DefaultStorageResponse(type, key, req, req_data, server);
-} else {
+has_pull = has_pull || req.pull;
+  }
+  if (has_pull) {
+// if there is a pull request, perform WaitToRead() once before 
DefaultStorageResponse
+if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
+stored.WaitToRead();
+for (const auto& req : update_buf->request) {
+  if (req.pull) {
+DefaultStorageResponse(type, key, req, req_data, server);
+  }
+}
+update_buf->request.clear();
+  } else {
+// otherwise, send response directly
+for (const auto& req : update_buf->request) {
   server->Response(req);
 }
+update_buf->request.clear();
+if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
+stored.WaitToRead();
   }
-  update_buf->request.clear();
-  if (has_multi_precision_copy(type)) CopyFromTo(stored, store_[key]);
-  stored.WaitToRead();
 } else {
   update_buf->merged.WaitToRead();
 }
diff --git a/tests/nightly/dist_device_sync_kvstore.py 
b/tests/nightly/dist_device_sync_kvstore.py
index dc2c7bc..f3fe737 100644
--- a/tests/nightly/dist_device_sync_kvstore.py
+++ b/tests/nightly/dist_device_sync_kvstore.py
@@ -44,7 +44,10 @@ kv = mx.kv.create('dist_device_sync')
 def init_kv():
 # init kv dns keys
 kv.init(keys, [mx.nd.ones(shape)] * len(keys))
+kv.init('9', mx.nd.ones(shape))
+kv.init('10', mx.nd.ones(shape))
 kv.init('99', mx.nd.ones(big_shape))
+kv.init('100', mx.nd.ones(big_shape))
 # worker info
 my_rank = kv.rank
 nworker = kv.num_workers
@@ -55,33 +58,30 @@ def init_kv():
 def test_sync_push_pull():
 kv, my_rank, nworker = init_kv()
 num_gpus = 2
-def check_default_keys(kv, my_rank, nworker, nrepeat=3, offset=0, 
use_pushpull=False):
+def check_default_keys(kv, my_rank, nworker, nrepeat=3):
 # checks pull after push in loop, because behavior during
 # consecutive pushes doesn't offer any guarantees
-for i in range(offset, nrepeat):
+for i in range(nrepeat):
 scale = my_rank + 1
 num = (nworker + 1) * nworker * rate * num_gpus / 2 * (i + 1) + 1
 
 arr = [mx.nd.ones(shape, ctx=mx.gpu(j)) * scale for j in 
range(num_gpus)]
 val = mx.nd.zeros(shape)
-if use_pushpull:
-kv.pushpull('3', arr, out=val)
-else:
-kv.push('3', arr)
-kv.pull('3', out=val)
+kv.push('9', arr)
+kv.pull('9', out=val)
+check_diff_to_scalar(val, num)
+kv.pushpull('10', arr, out=val)
 check_diff_to_scalar(val, num)
 
 big_arr = [mx.nd.ones(big_shape, ctx=mx.gpu(j)) * scale for j in 
range(num_gpus)]
 big_val = mx.nd.zeros(big_shape)
-if use_pu

[incubator-mxnet] branch master updated (04ebe45 -> 05af5c4)

2019-12-11 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 04ebe45  Prevent after-fork number of OMP threads being bigger than 1. 
(#16999)
 add 05af5c4  [BUGFIX] Fix race condition in kvstore.pushpull (#17007)

No new revisions were added by this update.

Summary of changes:
 src/kvstore/kvstore_dist_server.h | 35 +--
 tests/nightly/dist_device_sync_kvstore.py | 35 +--
 2 files changed, 43 insertions(+), 27 deletions(-)



[incubator-mxnet] branch master updated (04ebe45 -> 05af5c4)

2019-12-11 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 04ebe45  Prevent after-fork number of OMP threads being bigger than 1. 
(#16999)
 add 05af5c4  [BUGFIX] Fix race condition in kvstore.pushpull (#17007)

No new revisions were added by this update.

Summary of changes:
 src/kvstore/kvstore_dist_server.h | 35 +--
 tests/nightly/dist_device_sync_kvstore.py | 35 +--
 2 files changed, 43 insertions(+), 27 deletions(-)



[incubator-mxnet] branch v1.6.x updated (ff27b4b -> a576531)

2019-12-10 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from ff27b4b  [OP] changing data type of 't' to int in lamb_update_phase1 
(#16903)
 add a576531  Multi Precision Lamb Update operator (#16885)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/optimizer/optimizer.py |  59 
 src/operator/optimizer_op-inl.h | 159 +++-
 src/operator/optimizer_op.cc|  90 +-
 src/operator/optimizer_op.cu|   5 +
 tests/python/gpu/test_operator_gpu.py   |   1 +
 tests/python/unittest/test_optimizer.py |  14 +--
 6 files changed, 304 insertions(+), 24 deletions(-)



[incubator-mxnet] branch v1.6.x updated (c973f01 -> ff27b4b)

2019-12-10 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch v1.6.x
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from c973f01  Backport #16895, #16922, #16878, #16979 and #16900 to 1.6 
(#17029)
 add c7d484e  Lamb optimizer update (#16715)
 add ff27b4b  [OP] changing data type of 't' to int in lamb_update_phase1 
(#16903)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/optimizer/optimizer.py |  52 -
 src/operator/optimizer_op-inl.h | 188 
 src/operator/optimizer_op.cc|  81 ++
 src/operator/optimizer_op.cu|   7 ++
 tests/python/unittest/test_optimizer.py |  73 +
 5 files changed, 399 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated (986a902 -> 248acfa)

2019-12-09 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 986a902  introduce  gradient update handler to the  base estimator 
(#16900)
 add 248acfa  Multi Precision Lamb Update operator (#16885)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/optimizer/optimizer.py |  59 
 src/operator/optimizer_op-inl.h | 159 +++-
 src/operator/optimizer_op.cc|  90 +-
 src/operator/optimizer_op.cu|   5 +
 tests/python/gpu/test_operator_gpu.py   |   1 +
 tests/python/unittest/test_optimizer.py |  14 +--
 6 files changed, 304 insertions(+), 24 deletions(-)



[incubator-mxnet] branch master updated (986a902 -> 248acfa)

2019-12-09 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 986a902  introduce  gradient update handler to the  base estimator 
(#16900)
 add 248acfa  Multi Precision Lamb Update operator (#16885)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/optimizer/optimizer.py |  59 
 src/operator/optimizer_op-inl.h | 159 +++-
 src/operator/optimizer_op.cc|  90 +-
 src/operator/optimizer_op.cu|   5 +
 tests/python/gpu/test_operator_gpu.py   |   1 +
 tests/python/unittest/test_optimizer.py |  14 +--
 6 files changed, 304 insertions(+), 24 deletions(-)



[incubator-mxnet] branch master updated (fcc42de -> ca4939f)

2019-12-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fcc42de  updating MXNet version to 1.6.0 in base.h for C APIs (#16905)
 add ca4939f  [OP] changing data type of 't' to int in lamb_update_phase1 
(#16903)

No new revisions were added by this update.

Summary of changes:
 src/operator/optimizer_op-inl.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)



[incubator-mxnet] branch master updated (fcc42de -> ca4939f)

2019-12-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fcc42de  updating MXNet version to 1.6.0 in base.h for C APIs (#16905)
 add ca4939f  [OP] changing data type of 't' to int in lamb_update_phase1 
(#16903)

No new revisions were added by this update.

Summary of changes:
 src/operator/optimizer_op-inl.h | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)



[incubator-mxnet] branch benchmark updated (a47b540 -> cc0c356)

2019-11-29 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch benchmark
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a47b540  multi-precision lamb update operator
 add cc0c356  squad multi-lamb

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/contrib.py |  24 +++
 python/mxnet/optimizer/optimizer.py |  92 -
 python/mxnet/test_utils.py  |  80 +---
 src/operator/contrib/multi_lamb-inl.h   | 332 
 src/operator/contrib/multi_lamb.cc  | 245 +++
 src/operator/contrib/multi_lamb.cu  | 254 
 tests/python/unittest/test_optimizer.py |  98 ++
 7 files changed, 1096 insertions(+), 29 deletions(-)
 create mode 100644 src/operator/contrib/multi_lamb-inl.h
 create mode 100644 src/operator/contrib/multi_lamb.cc
 create mode 100644 src/operator/contrib/multi_lamb.cu



[incubator-mxnet] branch benchmark updated (a47b540 -> cc0c356)

2019-11-29 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch benchmark
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from a47b540  multi-precision lamb update operator
 add cc0c356  squad multi-lamb

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/contrib.py |  24 +++
 python/mxnet/optimizer/optimizer.py |  92 -
 python/mxnet/test_utils.py  |  80 +---
 src/operator/contrib/multi_lamb-inl.h   | 332 
 src/operator/contrib/multi_lamb.cc  | 245 +++
 src/operator/contrib/multi_lamb.cu  | 254 
 tests/python/unittest/test_optimizer.py |  98 ++
 7 files changed, 1096 insertions(+), 29 deletions(-)
 create mode 100644 src/operator/contrib/multi_lamb-inl.h
 create mode 100644 src/operator/contrib/multi_lamb.cc
 create mode 100644 src/operator/contrib/multi_lamb.cu



[incubator-mxnet] branch master updated: Lamb optimizer update (#16715)

2019-11-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 85d3ef3  Lamb optimizer update (#16715)
85d3ef3 is described below

commit 85d3ef3a40da20a4aac3030950aa0f37f8cb89c5
Author: Rohit Kumar Srivastava 
AuthorDate: Sat Nov 23 22:19:07 2019 -0800

Lamb optimizer update (#16715)

* initial commit lamb optimizer

* fixing base lamb optimizer

* adding API doc for Lamb Phase 1 and 2
---
 python/mxnet/optimizer/optimizer.py |  52 -
 src/operator/optimizer_op-inl.h | 186 
 src/operator/optimizer_op.cc|  81 ++
 src/operator/optimizer_op.cu|   7 ++
 tests/python/unittest/test_optimizer.py |  73 +
 5 files changed, 397 insertions(+), 2 deletions(-)

diff --git a/python/mxnet/optimizer/optimizer.py 
b/python/mxnet/optimizer/optimizer.py
index b7311b2..00d130b 100644
--- a/python/mxnet/optimizer/optimizer.py
+++ b/python/mxnet/optimizer/optimizer.py
@@ -34,14 +34,14 @@ from ..ndarray import (sgd_update, sgd_mom_update, 
adam_update, rmsprop_update,
multi_sgd_update, multi_sgd_mom_update, 
multi_mp_sgd_update,
multi_mp_sgd_mom_update, preloaded_multi_sgd_update,
preloaded_multi_sgd_mom_update, 
preloaded_multi_mp_sgd_update,
-   preloaded_multi_mp_sgd_mom_update)
+   preloaded_multi_mp_sgd_mom_update, lamb_update_phase1, 
lamb_update_phase2)
 from ..ndarray import sparse
 from ..random import normal
 from ..util import is_np_array
 
 __all__ = [
 'AdaDelta', 'AdaGrad', 'Adam', 'Adamax', 'DCASGD', 'FTML', 'Ftrl', 'LARS', 
'LBSGD',
-'NAG', 'NDabs', 'Nadam', 'Optimizer', 'RMSProp', 'SGD', 'SGLD', 'Signum',
+'NAG', 'NDabs', 'Nadam', 'Optimizer', 'RMSProp', 'SGD', 'SGLD', 'Signum', 
'LAMB',
 'Test', 'Updater', 'ccSGD', 'create', 'get_updater', 'register'
 ]
 
@@ -1244,6 +1244,54 @@ class LBSGD(Optimizer):
 kwargs = {}
 sgd_update(weight, grad, out=weight, lr=lr, wd=wd, **kwargs)
 
+
+@register
+class LAMB(Optimizer):
+"""LAMB Optimizer.
+"""
+def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, 
epsilon=1e-6,
+ lower_bound=None, upper_bound=None, bias_correction=True, 
**kwargs):
+super(LAMB, self).__init__(learning_rate=learning_rate, **kwargs)
+self.beta1 = beta1
+self.beta2 = beta2
+self.epsilon = epsilon
+self.lower_bound = lower_bound
+self.upper_bound = upper_bound
+self.bias_correction = bias_correction
+
+
+def create_state(self, index, weight):
+stype = weight.stype
+dtype = weight.dtype
+return (zeros(weight.shape, weight.context, dtype=dtype, stype=stype),
+zeros(weight.shape, weight.context, dtype=dtype, stype=stype))
+
+def update(self, index, weight, grad, state):
+assert(isinstance(weight, NDArray))
+assert(isinstance(grad, NDArray))
+self._update_count(index)
+lr = self._get_lr(index)
+wd = self._get_wd(index)
+t = self._index_update_count[index]
+
+kwargs = {'beta1': self.beta1, 'beta2': self.beta2, 'epsilon': 
self.epsilon,
+  'bias_correction': self.bias_correction, 't': t,
+  'rescale_grad': self.rescale_grad}
+mean, var = state
+if self.clip_gradient:
+kwargs['clip_gradient'] = self.clip_gradient
+g = lamb_update_phase1(weight, grad, mean, var, wd=wd, **kwargs)
+
+kwargs = {}
+if self.lower_bound:
+kwargs['lower_bound'] = self.lower_bound
+if self.upper_bound:
+kwargs['upper_bound'] = self.upper_bound
+r_1 = weight.norm()
+r_2 = g.norm()
+lamb_update_phase2(weight, g, r_1, r_2, lr=lr, out=weight, **kwargs)
+
+
 # pylint: enable=line-too-long
 @register
 class DCASGD(Optimizer):
diff --git a/src/operator/optimizer_op-inl.h b/src/operator/optimizer_op-inl.h
index c211d32..698f797 100644
--- a/src/operator/optimizer_op-inl.h
+++ b/src/operator/optimizer_op-inl.h
@@ -1563,6 +1563,192 @@ inline void AdamUpdateEx(const nnvm::NodeAttrs& attrs,
   }
 }
 
+struct LambUpdatePhaseOneParam : public 
dmlc::Parameter {
+float beta1;
+float beta2;
+float epsi

[incubator-mxnet] branch master updated: Lamb optimizer update (#16715)

2019-11-23 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git


The following commit(s) were added to refs/heads/master by this push:
 new 85d3ef3  Lamb optimizer update (#16715)
85d3ef3 is described below

commit 85d3ef3a40da20a4aac3030950aa0f37f8cb89c5
Author: Rohit Kumar Srivastava 
AuthorDate: Sat Nov 23 22:19:07 2019 -0800

Lamb optimizer update (#16715)

* initial commit lamb optimizer

* fixing base lamb optimizer

* adding API doc for Lamb Phase 1 and 2
---
 python/mxnet/optimizer/optimizer.py |  52 -
 src/operator/optimizer_op-inl.h | 186 
 src/operator/optimizer_op.cc|  81 ++
 src/operator/optimizer_op.cu|   7 ++
 tests/python/unittest/test_optimizer.py |  73 +
 5 files changed, 397 insertions(+), 2 deletions(-)

diff --git a/python/mxnet/optimizer/optimizer.py 
b/python/mxnet/optimizer/optimizer.py
index b7311b2..00d130b 100644
--- a/python/mxnet/optimizer/optimizer.py
+++ b/python/mxnet/optimizer/optimizer.py
@@ -34,14 +34,14 @@ from ..ndarray import (sgd_update, sgd_mom_update, 
adam_update, rmsprop_update,
multi_sgd_update, multi_sgd_mom_update, 
multi_mp_sgd_update,
multi_mp_sgd_mom_update, preloaded_multi_sgd_update,
preloaded_multi_sgd_mom_update, 
preloaded_multi_mp_sgd_update,
-   preloaded_multi_mp_sgd_mom_update)
+   preloaded_multi_mp_sgd_mom_update, lamb_update_phase1, 
lamb_update_phase2)
 from ..ndarray import sparse
 from ..random import normal
 from ..util import is_np_array
 
 __all__ = [
 'AdaDelta', 'AdaGrad', 'Adam', 'Adamax', 'DCASGD', 'FTML', 'Ftrl', 'LARS', 
'LBSGD',
-'NAG', 'NDabs', 'Nadam', 'Optimizer', 'RMSProp', 'SGD', 'SGLD', 'Signum',
+'NAG', 'NDabs', 'Nadam', 'Optimizer', 'RMSProp', 'SGD', 'SGLD', 'Signum', 
'LAMB',
 'Test', 'Updater', 'ccSGD', 'create', 'get_updater', 'register'
 ]
 
@@ -1244,6 +1244,54 @@ class LBSGD(Optimizer):
 kwargs = {}
 sgd_update(weight, grad, out=weight, lr=lr, wd=wd, **kwargs)
 
+
+@register
+class LAMB(Optimizer):
+"""LAMB Optimizer.
+"""
+def __init__(self, learning_rate=0.001, beta1=0.9, beta2=0.999, 
epsilon=1e-6,
+ lower_bound=None, upper_bound=None, bias_correction=True, 
**kwargs):
+super(LAMB, self).__init__(learning_rate=learning_rate, **kwargs)
+self.beta1 = beta1
+self.beta2 = beta2
+self.epsilon = epsilon
+self.lower_bound = lower_bound
+self.upper_bound = upper_bound
+self.bias_correction = bias_correction
+
+
+def create_state(self, index, weight):
+stype = weight.stype
+dtype = weight.dtype
+return (zeros(weight.shape, weight.context, dtype=dtype, stype=stype),
+zeros(weight.shape, weight.context, dtype=dtype, stype=stype))
+
+def update(self, index, weight, grad, state):
+assert(isinstance(weight, NDArray))
+assert(isinstance(grad, NDArray))
+self._update_count(index)
+lr = self._get_lr(index)
+wd = self._get_wd(index)
+t = self._index_update_count[index]
+
+kwargs = {'beta1': self.beta1, 'beta2': self.beta2, 'epsilon': 
self.epsilon,
+  'bias_correction': self.bias_correction, 't': t,
+  'rescale_grad': self.rescale_grad}
+mean, var = state
+if self.clip_gradient:
+kwargs['clip_gradient'] = self.clip_gradient
+g = lamb_update_phase1(weight, grad, mean, var, wd=wd, **kwargs)
+
+kwargs = {}
+if self.lower_bound:
+kwargs['lower_bound'] = self.lower_bound
+if self.upper_bound:
+kwargs['upper_bound'] = self.upper_bound
+r_1 = weight.norm()
+r_2 = g.norm()
+lamb_update_phase2(weight, g, r_1, r_2, lr=lr, out=weight, **kwargs)
+
+
 # pylint: enable=line-too-long
 @register
 class DCASGD(Optimizer):
diff --git a/src/operator/optimizer_op-inl.h b/src/operator/optimizer_op-inl.h
index c211d32..698f797 100644
--- a/src/operator/optimizer_op-inl.h
+++ b/src/operator/optimizer_op-inl.h
@@ -1563,6 +1563,192 @@ inline void AdamUpdateEx(const nnvm::NodeAttrs& attrs,
   }
 }
 
+struct LambUpdatePhaseOneParam : public 
dmlc::Parameter {
+float beta1;
+float beta2;
+float epsi

[incubator-mxnet] branch master updated (47fd3a0 -> 7d4f2f3)

2019-11-13 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 47fd3a0  Link fixes4 (#16764)
 add 7d4f2f3  [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of 
mirrored operators (#16022)

No new revisions were added by this update.

Summary of changes:
 src/executor/graph_executor.cc   |  2 --
 src/operator/nn/cudnn/cudnn_batch_norm-inl.h | 18 +-
 2 files changed, 17 insertions(+), 3 deletions(-)



[incubator-mxnet] branch master updated (47fd3a0 -> 7d4f2f3)

2019-11-13 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 47fd3a0  Link fixes4 (#16764)
 add 7d4f2f3  [MXNET-1421] Added (CuDNN)BatchNorm operator to the list of 
mirrored operators (#16022)

No new revisions were added by this update.

Summary of changes:
 src/executor/graph_executor.cc   |  2 --
 src/operator/nn/cudnn/cudnn_batch_norm-inl.h | 18 +-
 2 files changed, 17 insertions(+), 3 deletions(-)



[incubator-mxnet] branch master updated (58b824f -> da33da3)

2019-11-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 58b824f  fix R docs (#16733)
 add da33da3  Add MXNet Ops for fast multihead attention (#16408)

No new revisions were added by this update.

Summary of changes:
 src/common/cuda_utils.h|  74 +
 src/operator/contrib/transformer-inl.h |   9 +
 src/operator/contrib/transformer.cc| 270 
 src/operator/contrib/transformer.cu| 560 +
 tests/python/gpu/test_operator_gpu.py  | 316 ++-
 5 files changed, 1228 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (58b824f -> da33da3)

2019-11-06 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from 58b824f  fix R docs (#16733)
 add da33da3  Add MXNet Ops for fast multihead attention (#16408)

No new revisions were added by this update.

Summary of changes:
 src/common/cuda_utils.h|  74 +
 src/operator/contrib/transformer-inl.h |   9 +
 src/operator/contrib/transformer.cc| 270 
 src/operator/contrib/transformer.cu| 560 +
 tests/python/gpu/test_operator_gpu.py  | 316 ++-
 5 files changed, 1228 insertions(+), 1 deletion(-)



[incubator-mxnet] branch master updated (f9baec9 -> b5d07e3)

2019-10-30 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from f9baec9  [Numpy] implement np.column_stack (#16594)
 add b5d07e3  Add check if scipy is imported in sparse.py (#16574)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/sparse.py | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)



[incubator-mxnet] branch master updated (fc81c64 -> ffec31f)

2019-10-19 Thread haibin
This is an automated email from the ASF dual-hosted git repository.

haibin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git.


from fc81c64  Correct Google Analytics Tracker (#16490)
 add ffec31f  Aggregated adamw update (#16398)

No new revisions were added by this update.

Summary of changes:
 python/mxnet/ndarray/contrib.py |  56 +++-
 src/operator/contrib/adamw-inl.h| 368 +---
 src/operator/contrib/adamw.cc   | 166 +--
 src/operator/contrib/adamw.cu   |  34 +--
 tests/python/gpu/test_operator_gpu.py   |   1 +
 tests/python/unittest/test_contrib_optimizer.py | 236 ---
 6 files changed, 666 insertions(+), 195 deletions(-)



  1   2   3   4   5   6   >