[GitHub] zhreshold commented on issue #8325: Fix typo in gluon l1loss docstring
zhreshold commented on issue #8325: Fix typo in gluon l1loss docstring URL: https://github.com/apache/incubator-mxnet/pull/8325#issuecomment-337814203 Please rebase to master to fix the CI This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold opened a new pull request #8346: fix diagnose if hashtag not found.
zhreshold opened a new pull request #8346: fix diagnose if hashtag not found. URL: https://github.com/apache/incubator-mxnet/pull/8346 ## Description ## Fix diagnose message if not installed from pip ## Checklist ## ### Essentials ### - [x] Changes are complete (i.e. I finished coding on this PR) - [x] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhreshold commented on issue #8346: fix diagnose if hashtag not found.
zhreshold commented on issue #8346: fix diagnose if hashtag not found. URL: https://github.com/apache/incubator-mxnet/pull/8346#issuecomment-337813753 @szha This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on a change in pull request #8345: Misc fixes for sparse distributed training
rahul003 commented on a change in pull request #8345: Misc fixes for sparse distributed training URL: https://github.com/apache/incubator-mxnet/pull/8345#discussion_r145605694 ## File path: src/kvstore/kvstore_dist.h ## @@ -366,15 +362,15 @@ class KVStoreDist : public KVStoreLocal { // push row sparse gradient void PushRowSparse(int key, const NDArray &send_buf, int priority) { using namespace rowsparse; -auto push_to_servers = [this, key, &send_buf] +auto push_to_servers = [this, key, send_buf] (RunContext rctx, Engine::CallbackOnComplete cb) { #if MKL_EXPERIMENTAL == 1 mkl_set_tblob_eager_mode(send_buf.data()); #endif real_t* data = send_buf.data().dptr(); bool init = send_buf.storage_initialized(); Review comment: You can remove init, it is no longer used This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 commented on a change in pull request #8345: Misc fixes for sparse distributed training
rahul003 commented on a change in pull request #8345: Misc fixes for sparse distributed training URL: https://github.com/apache/incubator-mxnet/pull/8345#discussion_r145605694 ## File path: src/kvstore/kvstore_dist.h ## @@ -366,15 +362,15 @@ class KVStoreDist : public KVStoreLocal { // push row sparse gradient void PushRowSparse(int key, const NDArray &send_buf, int priority) { using namespace rowsparse; -auto push_to_servers = [this, key, &send_buf] +auto push_to_servers = [this, key, send_buf] (RunContext rctx, Engine::CallbackOnComplete cb) { #if MKL_EXPERIMENTAL == 1 mkl_set_tblob_eager_mode(send_buf.data()); #endif real_t* data = send_buf.data().dptr(); bool init = send_buf.storage_initialized(); Review comment: Looks like init is no longer used This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] benqua commented on issue #8297: [scala] Make accuracy idependant of output size (fix #8226)
benqua commented on issue #8297: [scala] Make accuracy idependant of output size (fix #8226) URL: https://github.com/apache/incubator-mxnet/pull/8297#issuecomment-337812605 It can be done, but it would break the API for people calling EvalMetric.get (https://github.com/apache/incubator-mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/EvalMetric.scala#L52) . This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145606227 ## File path: tests/python/unittest/test_sparse_ndarray.py ## @@ -647,7 +647,20 @@ def test_sparse_nd_exception(): (2,2), shape=(3,2)) assert_exception(mx.nd.sparse.zeros, ValueError, "invalid_stype", (2,2)) - +stypes = ["csr", "row_sparse"] +for stype in stypes: +a.tostype(stype).check_format() +try: +shape = (3, 4) +data_list = [7, 8, 9] +indices_list = [0, 2, 1] +indptr_list = [0, 5, 2, 3] +a = mx.nd.sparse.csr_matrix((data_list, indices_list, indptr_list), shape) +a.check_format() +assert(False) Review comment: We should either add test for illegal rsp or disable it in frontend This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605750 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, + TBlob *cpu_err, const bool &full_check) { + using namespace op::mxnet_op; + auto stype = input->storage_type(); + auto err = cpu_err->dptr(); + if (stype == kCSRStorage) { +const TShape shape = input->shape(); +const TShape idx_shape = input->aux_shape(csr::kIdx); +const TShape indptr_shape = input->aux_shape(csr::kIndPtr); +const TShape storage_shape = input->storage_shape(); +if ((shape.ndim() != 2) || +(idx_shape.ndim() != 1 || indptr_shape.ndim() != 1 || storage_shape.ndim() != 1) || +(indptr_shape[0] != shape[0] + 1) || +(idx_shape[0] != storage_shape[0])) { + *err = kCSRShapeErr; + return; +} +if (full_check) { + NDArray xpu_ret = NDArray(mshadow::Shape1(1), rctx.get_ctx()); + TBlob xpu_tmp = xpu_ret.data(); + ndarray::Eval(kNormalErr, &xpu_tmp, rctx); + int indptr_type = input->aux_type(csr::kIndPtr); + MSHADOW_TYPE_SWITCH(indptr_type, IType, { +Kernel::Launch( + rctx.get_stream(), indptr_shape[0]-1, xpu_ret.data().dptr(), + input->aux_data(csr::kIndPtr).dptr(), + indptr_shape[0]-1, idx_shape[0]); + }); + int idx_type = input->aux_type(csr::kIdx); + MSHADOW_TYPE_SWITCH(idx_type, IType, { +Kernel::Launch( + rctx.get_stream(), idx_shape[0], xpu_ret.data().dptr(), + input->aux_data(csr::kIdx).dptr(), shape[1]); + }); + ndarray::Copy(xpu_ret.data(), cpu_err, + xpu_ret.ctx(), Context::CPU(), rctx); +} + } else if (stype == kRowSparseStorage) { +if (input->aux_shape(rowsparse::kIdx)[0] != input->storage_shape()[0]) { + *err = kRSPShapeErr; +} Review comment: Do we check `full_check`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145603905 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, Review comment: Why use ptr for input? Why adding reference for primitive types instead of plain value(ref will be slower)? What about ``` Stream* , const NDArray&, const TBlob&, const bool ``` ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145604804 ## File path: python/mxnet/ndarray/sparse.py ## @@ -235,6 +235,16 @@ def copyto(self, other): else: raise TypeError('copyto does not support type ' + str(type(other))) +def check_format(self, full_check=True): +"""check whether the matrix format is valid. Review comment: Users can also call check_format for RowSparseNDArray. It looks like the checks for rsp is not complete (only shape checks are available), we should either throw exception saying check_fmt for rsp is not implemented, or complete the implementation for rsp in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145604923 ## File path: src/ndarray/ndarray.cc ## @@ -1214,6 +1214,39 @@ void NDArray::SyncCopyToCPU(void *data, size_t size) const { } } +void NDArray::CheckFormat(const bool full_check) const { + mshadow::default_real_t err = kNormalErr; Review comment: Hmm, is the type `default_real_t ` intentional? kNormalErr is an enum which is usually a integer This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605904 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, + TBlob *cpu_err, const bool &full_check) { + using namespace op::mxnet_op; + auto stype = input->storage_type(); + auto err = cpu_err->dptr(); + if (stype == kCSRStorage) { +const TShape shape = input->shape(); +const TShape idx_shape = input->aux_shape(csr::kIdx); +const TShape indptr_shape = input->aux_shape(csr::kIndPtr); +const TShape storage_shape = input->storage_shape(); +if ((shape.ndim() != 2) || +(idx_shape.ndim() != 1 || indptr_shape.ndim() != 1 || storage_shape.ndim() != 1) || +(indptr_shape[0] != shape[0] + 1) || +(idx_shape[0] != storage_shape[0])) { + *err = kCSRShapeErr; + return; +} +if (full_check) { + NDArray xpu_ret = NDArray(mshadow::Shape1(1), rctx.get_ctx()); + TBlob xpu_tmp = xpu_ret.data(); + ndarray::Eval(kNormalErr, &xpu_tmp, rctx); + int indptr_type = input->aux_type(csr::kIndPtr); + MSHADOW_TYPE_SWITCH(indptr_type, IType, { +Kernel::Launch( Review comment: Since the user of this function may pass a TBlob of dtype = kInt32, kFloat32, etc, we should add a TYPE_SWITCH for the TBlob passed in. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605552 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template Review comment: Let's add some documentation for this function, since this will be the one others use when they try to check the fmt of an NDArray in other backend cpp code. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145603042 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, + TBlob *cpu_err, const bool &full_check) { + using namespace op::mxnet_op; + auto stype = input->storage_type(); + auto err = cpu_err->dptr(); + if (stype == kCSRStorage) { +const TShape shape = input->shape(); +const TShape idx_shape = input->aux_shape(csr::kIdx); +const TShape indptr_shape = input->aux_shape(csr::kIndPtr); +const TShape storage_shape = input->storage_shape(); +if ((shape.ndim() != 2) || +(idx_shape.ndim() != 1 || indptr_shape.ndim() != 1 || storage_shape.ndim() != 1) || +(indptr_shape[0] != shape[0] + 1) || +(idx_shape[0] != storage_shape[0])) { + *err = kCSRShapeErr; + return; +} +if (full_check) { + NDArray xpu_ret = NDArray(mshadow::Shape1(1), rctx.get_ctx()); + TBlob xpu_tmp = xpu_ret.data(); + ndarray::Eval(kNormalErr, &xpu_tmp, rctx); Review comment: `ndarray::Eval` is deprecated. Can we use Kernel instead? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145604477 ## File path: tests/python/unittest/test_sparse_ndarray.py ## @@ -647,7 +647,20 @@ def test_sparse_nd_exception(): (2,2), shape=(3,2)) assert_exception(mx.nd.sparse.zeros, ValueError, "invalid_stype", (2,2)) - +stypes = ["csr", "row_sparse"] Review comment: Can we test this on GPU, too? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145602552 ## File path: python/mxnet/ndarray/sparse.py ## @@ -235,6 +235,16 @@ def copyto(self, other): else: raise TypeError('copyto does not support type ' + str(type(other))) +def check_format(self, full_check=True): +"""check whether the matrix format is valid. Review comment: matrix format -> ndarray format This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605688 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, + TBlob *cpu_err, const bool &full_check) { + using namespace op::mxnet_op; + auto stype = input->storage_type(); + auto err = cpu_err->dptr(); + if (stype == kCSRStorage) { +const TShape shape = input->shape(); +const TShape idx_shape = input->aux_shape(csr::kIdx); +const TShape indptr_shape = input->aux_shape(csr::kIndPtr); +const TShape storage_shape = input->storage_shape(); +if ((shape.ndim() != 2) || +(idx_shape.ndim() != 1 || indptr_shape.ndim() != 1 || storage_shape.ndim() != 1) || +(indptr_shape[0] != shape[0] + 1) || +(idx_shape[0] != storage_shape[0])) { + *err = kCSRShapeErr; + return; +} +if (full_check) { + NDArray xpu_ret = NDArray(mshadow::Shape1(1), rctx.get_ctx()); + TBlob xpu_tmp = xpu_ret.data(); + ndarray::Eval(kNormalErr, &xpu_tmp, rctx); + int indptr_type = input->aux_type(csr::kIndPtr); + MSHADOW_TYPE_SWITCH(indptr_type, IType, { +Kernel::Launch( + rctx.get_stream(), indptr_shape[0]-1, xpu_ret.data().dptr(), + input->aux_data(csr::kIndPtr).dptr(), + indptr_shape[0]-1, idx_shape[0]); + }); + int idx_type = input->aux_type(csr::kIdx); + MSHADOW_TYPE_SWITCH(idx_type, IType, { +Kernel::Launch( + rctx.get_stream(), idx_shape[0], xpu_ret.data().dptr(), + input->aux_data(csr::kIdx).dptr(), shape[1]); + }); + ndarray::Copy(xpu_ret.data(), cpu_err, Review comment: We're deprecating `ndarray_function.h`. Can we use `mshadow::Copy` instead? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605970 ## File path: src/common/utils.h ## @@ -43,9 +43,88 @@ #include #include +#include "../operator/mxnet_op.h" +#include "../ndarray/ndarray_function.h" + namespace mxnet { namespace common { + +/*! + * \brief IndPtr should be in non-decreasing order, start with 0 + * and end with value greater or equal than size of indices. + */ +struct indptr_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, const DType* in, + const nnvm::dim_t end, const nnvm::dim_t idx_size) { +if ((in[i+1] < in[i]) || (i == 0 && in[i] != static_cast(0)) || +(i == end && in[i] < static_cast(idx_size))) + *out = kCSRIndPtrErr; + } +}; + +/*! + * \brief Indices should be less than the number of columns. + */ +struct idx_check { + template + MSHADOW_XINLINE static void Map(int i, mshadow::default_real_t* out, + const DType* in, const nnvm::dim_t ncols) { +if (in[i] >= static_cast(ncols)) *out = kCSRIdxErr; + } +}; + +template +void CheckFormatWrapper(const RunContext &rctx, const NDArray *input, +TBlob *cpu_err, const bool &full_check); + +template +void CheckFormatImpl(const RunContext &rctx, const NDArray *input, + TBlob *cpu_err, const bool &full_check) { + using namespace op::mxnet_op; + auto stype = input->storage_type(); + auto err = cpu_err->dptr(); + if (stype == kCSRStorage) { +const TShape shape = input->shape(); +const TShape idx_shape = input->aux_shape(csr::kIdx); +const TShape indptr_shape = input->aux_shape(csr::kIndPtr); +const TShape storage_shape = input->storage_shape(); +if ((shape.ndim() != 2) || +(idx_shape.ndim() != 1 || indptr_shape.ndim() != 1 || storage_shape.ndim() != 1) || +(indptr_shape[0] != shape[0] + 1) || +(idx_shape[0] != storage_shape[0])) { + *err = kCSRShapeErr; + return; +} +if (full_check) { + NDArray xpu_ret = NDArray(mshadow::Shape1(1), rctx.get_ctx()); Review comment: Shall we create one based on the dtype of cpu_err? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr
eric-haibin-lin commented on a change in pull request #8259: check_format of ndrray, mainly for csr URL: https://github.com/apache/incubator-mxnet/pull/8259#discussion_r145605158 ## File path: src/ndarray/ndarray.cc ## @@ -1214,6 +1214,39 @@ void NDArray::SyncCopyToCPU(void *data, size_t size) const { } } +void NDArray::CheckFormat(const bool full_check) const { + mshadow::default_real_t err = kNormalErr; + void *err_ptr = static_cast(&err); Review comment: Is this cast necessary? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wa1618i closed issue #8333: make: *** [build/src/operator/activation_gpu.o] Error 2
wa1618i closed issue #8333: make: *** [build/src/operator/activation_gpu.o] Error 2 URL: https://github.com/apache/incubator-mxnet/issues/8333 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wa1618i commented on issue #5218: core dumped when I try to compile mxnet0.9.3 with nnpack support WHY?
wa1618i commented on issue #5218: core dumped when I try to compile mxnet0.9.3 with nnpack support WHY? URL: https://github.com/apache/incubator-mxnet/issues/5218#issuecomment-337805448 @szha I have same issue here, please re-open. I can build mxnet from the source without error but trying to import mxnet inside python, i get the error: [12:43:09] /home/lemma/mxnet/dmlc-core/include/dmlc/logging.h:308: [12:43:09] src/operator/nnpack/nnpack_util.h:43: nnp_initialize failed status=51 Stack trace returned 10 entries: [bt] (0) /home/lemma/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16NNPACKInitializeC1Ev+0x2fb) [0x7f25d2ced66b] [bt] (1) /home/lemma/mxnet/python/mxnet/../../lib/libmxnet.so(+0x66d0a6) [0x7f25d2a750a6] [bt] (2) /lib64/ld-linux-x86-64.so.2(+0x102da) [0x7f25fc9532da] [bt] (3) /lib64/ld-linux-x86-64.so.2(+0x103c3) [0x7f25fc9533c3] [bt] (4) /lib64/ld-linux-x86-64.so.2(+0x14e00) [0x7f25fc957e00] [bt] (5) /lib64/ld-linux-x86-64.so.2(+0x10194) [0x7f25fc953194] [bt] (6) /lib64/ld-linux-x86-64.so.2(+0x1454b) [0x7f25fc95754b] [bt] (7) /lib/x86_64-linux-gnu/libdl.so.2(+0x102b) [0x7f25fc14502b] [bt] (8) /lib64/ld-linux-x86-64.so.2(+0x10194) [0x7f25fc953194] [bt] (9) /lib/x86_64-linux-gnu/libdl.so.2(+0x162d) [0x7f25fc14562d] terminate called after throwing an instance of 'dmlc::Error' what(): [12:43:09] src/operator/nnpack/nnpack_util.h:43: nnp_initialize failed status=51 Stack trace returned 10 entries: [bt] (0) /home/lemma/mxnet/python/mxnet/../../lib/libmxnet.so(_ZN5mxnet2op16NNPACKInitializeC1Ev+0x2fb) [0x7f25d2ced66b] [bt] (1) /home/lemma/mxnet/python/mxnet/../../lib/libmxnet.so(+0x66d0a6) [0x7f25d2a750a6] [bt] (2) /lib64/ld-linux-x86-64.so.2(+0x102da) [0x7f25fc9532da] [bt] (3) /lib64/ld-linux-x86-64.so.2(+0x103c3) [0x7f25fc9533c3] [bt] (4) /lib64/ld-linux-x86-64.so.2(+0x14e00) [0x7f25fc957e00] [bt] (5) /lib64/ld-linux-x86-64.so.2(+0x10194) [0x7f25fc953194] [bt] (6) /lib64/ld-linux-x86-64.so.2(+0x1454b) [0x7f25fc95754b] [bt] (7) /lib/x86_64-linux-gnu/libdl.so.2(+0x102b) [0x7f25fc14502b] [bt] (8) /lib64/ld-linux-x86-64.so.2(+0x10194) [0x7f25fc953194] [bt] (9) /lib/x86_64-linux-gnu/libdl.so.2(+0x162d) [0x7f25fc14562d] This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] goodtogood commented on issue #7771: Error when loading JSON network definition with c++ api
goodtogood commented on issue #7771: Error when loading JSON network definition with c++ api URL: https://github.com/apache/incubator-mxnet/issues/7771#issuecomment-337804646 same problem. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen commented on issue #8343: [CMAKE] Cmake changes, upgrade training test so it converge
tqchen commented on issue #8343: [CMAKE] Cmake changes, upgrade training test so it converge URL: https://github.com/apache/incubator-mxnet/pull/8343#issuecomment-337801113 failure in the tests appears due to the missing ci machine. need retrigger when they are up This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chinakook closed issue #8336: Building with libjpeg-turbo error
chinakook closed issue #8336: Building with libjpeg-turbo error URL: https://github.com/apache/incubator-mxnet/issues/8336 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reminisce commented on a change in pull request #8340: Fill optimizations
reminisce commented on a change in pull request #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#discussion_r14559 ## File path: src/operator/tensor/init_op.h ## @@ -164,19 +164,38 @@ inline bool InitStorageType(const nnvm::NodeAttrs& attrs, return true; } +/*! \brief Fill output with a scalar integer value */ template void FillCompute(const nnvm::NodeAttrs& attrs, const OpContext& ctx, const std::vector& inputs, const std::vector& req, const std::vector& outputs) { - using namespace mshadow; - using namespace mshadow::expr; - Stream *s = ctx.get_stream(); - MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, { -Tensor out = outputs[0].FlatTo1D(s); -ASSIGN_DISPATCH(out, req[0], scalar(value)); - }); + if (req[0] != kNullOp) { +mshadow::Stream *s = ctx.get_stream(); +MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, { + mxnet_op::Kernel, xpu>::Launch(s, Review comment: What if `req[0]` is `kAddTo`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] chinakook commented on issue #8336: Building with libjpeg-turbo error
chinakook commented on issue #8336: Building with libjpeg-turbo error URL: https://github.com/apache/incubator-mxnet/issues/8336#issuecomment-337800856 OK This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] eric-haibin-lin opened a new pull request #8345: Misc fixes for sparse distributed training
eric-haibin-lin opened a new pull request #8345: Misc fixes for sparse distributed training URL: https://github.com/apache/incubator-mxnet/pull/8345 ## Description ## - As #8116 removes [wait_to_write](https://github.com/apache/incubator-mxnet/pull/8116/files#diff-0cd6fcb2cd941d4c4a829bb3d7ea3d63L274) when updating comm_buff, it's not safe to pass NDArray* to the callback for row_sparse_pull. Now changed to NDArray. - Removed the usage of `mshadow::range` in `FillDnsZerosRspImpl` since `mshadow::range` uses float to calculate the tensor shape and is inaccurate for large shapes. - Added unit test for pulling empty sparse weights - Removed wrong/misleading comments @bhavinthaker @madjam @rahul003 ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage - [ ] For user-facing API changes, API doc string has been updated. - [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Intersting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dslate opened a new issue #8344: Broken link for "MXNet R Reference Manual"?
dslate opened a new issue #8344: Broken link for "MXNet R Reference Manual"? URL: https://github.com/apache/incubator-mxnet/issues/8344 On the MXNet page for the R API (http://mxnet.incubator.apache.org/api/r/index.html) the link to the "MXNet R Reference Manual" seems to point to a document called "mxnet-test.pdf", whose contents are titled "MXNet Documentation Release 0.0.8" and seem to be for the Julia language, not R. Is this link incorrect, or am I just confused? I am looking for the MXNet R API. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] reminisce commented on a change in pull request #8340: Fill optimizations
reminisce commented on a change in pull request #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#discussion_r145600607 ## File path: src/operator/tensor/init_op.h ## @@ -164,19 +164,38 @@ inline bool InitStorageType(const nnvm::NodeAttrs& attrs, return true; } +/*! \brief Fill output with a scalar integer value */ template void FillCompute(const nnvm::NodeAttrs& attrs, const OpContext& ctx, const std::vector& inputs, const std::vector& req, const std::vector& outputs) { - using namespace mshadow; - using namespace mshadow::expr; - Stream *s = ctx.get_stream(); - MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, { -Tensor out = outputs[0].FlatTo1D(s); -ASSIGN_DISPATCH(out, req[0], scalar(value)); - }); + if (req[0] != kNullOp) { +mshadow::Stream *s = ctx.get_stream(); +MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, { + mxnet_op::Kernel, xpu>::Launch(s, + outputs[0].Size(), + outputs[0].dptr()); +}); + } +} + +/*! \brief Fast CPU fill-zero version using memset */ +template<> +inline void FillCompute(const nnvm::NodeAttrs& attrs, +const OpContext& ctx, +const std::vector& inputs, +const std::vector& req, +const std::vector& outputs) { + if (req[0] != kNullOp) { +const size_t size = outputs[0].Size(); +if (size) { + MSHADOW_TYPE_SWITCH(outputs[0].type_flag_, DType, { +memset(outputs[0].dptr(), 0, size * sizeof(DType)); Review comment: `outputs[0].dptr_` is more efficient here than `outputs[0].dptr()`. Question: How much faster is this compared to the original implementation of filling up an `TBlob`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5346: Amalgamation using weBLAS (JS + WebGL) ?
tqchen closed issue #5346: Amalgamation using weBLAS (JS + WebGL) ? URL: https://github.com/apache/incubator-mxnet/issues/5346 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #4003: Building a visualization tool for MXNet
tqchen closed issue #4003: Building a visualization tool for MXNet URL: https://github.com/apache/incubator-mxnet/issues/4003 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3987: How to set auxiliary state in Batchnorm manually ?
tqchen closed issue #3987: How to set auxiliary state in Batchnorm manually ? URL: https://github.com/apache/incubator-mxnet/issues/3987 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3946: When predicting, does mxnet provide thread-safe interface?
tqchen closed issue #3946: When predicting, does mxnet provide thread-safe interface? URL: https://github.com/apache/incubator-mxnet/issues/3946 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3198: Check list for more operators
tqchen closed issue #3198: Check list for more operators URL: https://github.com/apache/incubator-mxnet/issues/3198 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3917: how to use my own loss to compute grad?
tqchen closed issue #3917: how to use my own loss to compute grad? URL: https://github.com/apache/incubator-mxnet/issues/3917 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3200: [OP] Array manipulation
tqchen closed issue #3200: [OP] Array manipulation URL: https://github.com/apache/incubator-mxnet/issues/3200 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen commented on issue #3198: Check list for more operators
tqchen commented on issue #3198: Check list for more operators URL: https://github.com/apache/incubator-mxnet/issues/3198#issuecomment-337798930 close on stale issue This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3487: [PERF] Call for NN Layer Kernel Improvement
tqchen closed issue #3487: [PERF] Call for NN Layer Kernel Improvement URL: https://github.com/apache/incubator-mxnet/issues/3487 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3523: Operator documents issues tracking
tqchen closed issue #3523: Operator documents issues tracking URL: https://github.com/apache/incubator-mxnet/issues/3523 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3504: [RFC] Documentation of MXNet
tqchen closed issue #3504: [RFC] Documentation of MXNet URL: https://github.com/apache/incubator-mxnet/issues/3504 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3201: [OP] Mathematical functions
tqchen closed issue #3201: [OP] Mathematical functions URL: https://github.com/apache/incubator-mxnet/issues/3201 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3724: asnumpy() of NDArray @cpu halted
tqchen closed issue #3724: asnumpy() of NDArray @cpu halted URL: https://github.com/apache/incubator-mxnet/issues/3724 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #2066: Mxnet model visualisation
tqchen closed issue #2066: Mxnet model visualisation URL: https://github.com/apache/incubator-mxnet/issues/2066 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #2771: CRF as RNN with permutohedral lattice
tqchen closed issue #2771: CRF as RNN with permutohedral lattice URL: https://github.com/apache/incubator-mxnet/issues/2771 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #2284: Compiling MxNet: Problems with CMake
tqchen closed issue #2284: Compiling MxNet: Problems with CMake URL: https://github.com/apache/incubator-mxnet/issues/2284 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #2944: v1.0 Stable Release TODO List
tqchen closed issue #2944: v1.0 Stable Release TODO List URL: https://github.com/apache/incubator-mxnet/issues/2944 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3084: Scala Package for v1.0 TODO List
tqchen closed issue #3084: Scala Package for v1.0 TODO List URL: https://github.com/apache/incubator-mxnet/issues/3084 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3006: [R] R package roadmap
tqchen closed issue #3006: [R] R package roadmap URL: https://github.com/apache/incubator-mxnet/issues/3006 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1873: Implement a model serving framework
tqchen closed issue #1873: Implement a model serving framework URL: https://github.com/apache/incubator-mxnet/issues/1873 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1642: dist_async slower than dist_sync
tqchen closed issue #1642: dist_async slower than dist_sync URL: https://github.com/apache/incubator-mxnet/issues/1642 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1281: how to prepare data for mxnet.io.NDArrayIter
tqchen closed issue #1281: how to prepare data for mxnet.io.NDArrayIter URL: https://github.com/apache/incubator-mxnet/issues/1281 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1784: Bug in ccSGD optimizer
tqchen closed issue #1784: Bug in ccSGD optimizer URL: https://github.com/apache/incubator-mxnet/issues/1784 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1412: I dont quite understand how Upsampling works.
tqchen closed issue #1412: I dont quite understand how Upsampling works. URL: https://github.com/apache/incubator-mxnet/issues/1412 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #1237: can mxnet provide the sparse gradient update for word embedding
tqchen closed issue #1237: can mxnet provide the sparse gradient update for word embedding URL: https://github.com/apache/incubator-mxnet/issues/1237 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen commented on issue #6996: Label smoothing for SoftmaxOutput / cross-entropy loss
tqchen commented on issue #6996: Label smoothing for SoftmaxOutput / cross-entropy loss URL: https://github.com/apache/incubator-mxnet/issues/6996#issuecomment-337798214 close as it is enabled This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #6996: Label smoothing for SoftmaxOutput / cross-entropy loss
tqchen closed issue #6996: Label smoothing for SoftmaxOutput / cross-entropy loss URL: https://github.com/apache/incubator-mxnet/issues/6996 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5807: Model Parallelism
tqchen closed issue #5807: Model Parallelism URL: https://github.com/apache/incubator-mxnet/issues/5807 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5804: Scala build failed with "undefined symbol: __cudaRegisterFatBinary"
tqchen closed issue #5804: Scala build failed with "undefined symbol: __cudaRegisterFatBinary" URL: https://github.com/apache/incubator-mxnet/issues/5804 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5994: Any feature to access loss value in mxnet like keras package for tensorflow
tqchen closed issue #5994: Any feature to access loss value in mxnet like keras package for tensorflow URL: https://github.com/apache/incubator-mxnet/issues/5994 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5592: [Discussion] CreateBackwardOp interface for `Operator`
tqchen closed issue #5592: [Discussion] CreateBackwardOp interface for `Operator` URL: https://github.com/apache/incubator-mxnet/issues/5592 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5699: [Discussion] Support Higher-order Gradient
tqchen closed issue #5699: [Discussion] Support Higher-order Gradient URL: https://github.com/apache/incubator-mxnet/issues/5699 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5566: MinPy next step prototype
tqchen closed issue #5566: MinPy next step prototype URL: https://github.com/apache/incubator-mxnet/issues/5566 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #5580: How to create a Custom Operator with extra parameters in Python?
tqchen closed issue #5580: How to create a Custom Operator with extra parameters in Python? URL: https://github.com/apache/incubator-mxnet/issues/5580 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #3509: [RELEASE] Announcing v0.9 Release Candidate 1
tqchen closed issue #3509: [RELEASE] Announcing v0.9 Release Candidate 1 URL: https://github.com/apache/incubator-mxnet/issues/3509 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #4851: error: ISO C++ forbids comparison between pointer and integer
tqchen closed issue #4851: error: ISO C++ forbids comparison between pointer and integer URL: https://github.com/apache/incubator-mxnet/issues/4851 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #4783: [v0.9.3] Amalgamation for Android broken
tqchen closed issue #4783: [v0.9.3] Amalgamation for Android broken URL: https://github.com/apache/incubator-mxnet/issues/4783 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #4989: How to use mxnet?
tqchen closed issue #4989: How to use mxnet? URL: https://github.com/apache/incubator-mxnet/issues/4989 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen closed issue #792: Support a parameter for maximum memory usage
tqchen closed issue #792: Support a parameter for maximum memory usage URL: https://github.com/apache/incubator-mxnet/issues/792 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen commented on issue #8343: [CMAKE] Cmake changes, upgrade training test so it converge
tqchen commented on issue #8343: [CMAKE] Cmake changes, upgrade training test so it converge URL: https://github.com/apache/incubator-mxnet/pull/8343#issuecomment-337796868 cc @piiswrong This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] tqchen opened a new pull request #8343: [CMAKE] Cmake changes, upgrade training test so it converge
tqchen opened a new pull request #8343: [CMAKE] Cmake changes, upgrade training test so it converge URL: https://github.com/apache/incubator-mxnet/pull/8343 ## Description ## (Brief description on what this PR is about) ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage - [ ] For user-facing API changes, API doc string has been updated. - [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] Feature1, tests, (and when applicable, API doc) - [ ] Feature2, tests, (and when applicable, API doc) ## Comments ## - If this change is a backward incompatible change, why must this change be made. - Intersting edge cases to note here This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337796480 Sorry, I just paste the codes you posted at first comment, without adding slice. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kpot commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
kpot commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337796013 @ZiyueHuang Yes, I think the empty line is indeed the argument `''`. But when I replace `mx.nd.ones_like(b)` with `mx.nd.ones((1,))` I still get the same error. Are you sure that when it worked for you, you actually did use slicing? Just to be on the same page, here's the full code that fails, even though I believe it shouldn't: ``` import mxnet as mx from mxnet import nd, autograd ctx = mx.cpu() a = mx.nd.array([1, 2, 3, 4], ctx=ctx) a.attach_grad() with autograd.record(): b = nd.sum(2 * (a[0:4] ** 2)) # works without slicing grads = autograd.grad(b, [a], create_graph=True, retain_graph=True) da_sym = autograd.get_symbol(grads[0]) executor = da_sym.bind(ctx=ctx, args=[nd.ones_like(b), a]) executor.forward() print(executor.outputs[0]) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337794800 I think the empty line in the error massage is the argument `''`. Please try replace `mx.nd.ones_like(b)` with `mx.nd.ones((1,))`, it works fine for me. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kpot commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
kpot commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337794186 @ZiyueHuang `a`s first dimension is 4, and slicing it like `a[0:4]` is absolutely valid and I didn't care about effectiveness here. But after you asked, I tried different expressions. I tried different sizes of `a` (for example, `a = mx.nd.array([ [ 1, 2, 3, 4] ])` and slicing it in the expression as `a[0]`). None of that has worked. I still see the same error every time I use slicing. `da_sym.list_arguments()` returns `['', 'var0']`. One must be the head gradient for the chain rule and another one is a placeholder the for variable `a`. That's why I used such arguments. Which is which I determined experimentally, since both have different shapes, and I could easilly check the result of `executor.forward()` knowing derivative `db / da = 4 * a`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337792011 `a`'s shape is `(4,)`. Why "a's first dimension has length 1 and slicing it with 0:4 doesn't really make sense"? @piiswrong ``` >>> import numpy as np >>> a=np.ones((3,)) >>> a[0:2] array([ 1., 1.]) ``` @kpot Could you please use `da_sym.list_arguments() ` to see what are the arguments? Why `args=[mx.nd.ones_like(b), a]`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337792225 Is this a bug or wrong usage of autograd? Do you have any idea? @szha This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices
ZiyueHuang commented on issue #8337: mx.autograd.grad works or fails depending on use of slices URL: https://github.com/apache/incubator-mxnet/issues/8337#issuecomment-337792011 `a`'s shape is `(4,)`. Why "a's first dimension has length 1 and slicing it with 0:4 doesn't really make sense"? @piiswrong @kpot Could you please use `da_sym.list_arguments() ` to see what are the arguments? Why `args=[mx.nd.ones_like(b), a]`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0
ZiyueHuang commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0 URL: https://github.com/apache/incubator-mxnet/issues/8338#issuecomment-337790437 Seems that @wa1618i got the same problem in https://github.com/apache/incubator-mxnet/issues/8333. Have you got any solutions so far? @wa1618i This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] javelinjs commented on issue #8297: [scala] Make accuracy idependant of output size (fix #8226)
javelinjs commented on issue #8297: [scala] Make accuracy idependant of output size (fix #8226) URL: https://github.com/apache/incubator-mxnet/pull/8297#issuecomment-337790078 We should keep it the same as other language bindings, especially python. What if we make it Double in EvalMetric API? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ZiyueHuang commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0
ZiyueHuang commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0 URL: https://github.com/apache/incubator-mxnet/issues/8338#issuecomment-337789562 Thanks for your suggestions. Seems wired. It can build successfully on ubuntu. Why centos cannot? Yes, the error messages are complete. Warnings are not fixed by reverting `smooth_l1`, I change these codes to ``` struct smooth_l1_loss { // a is x, b is sigma2 template MSHADOW_XINLINE static DType Map(DType a, DType b) { b *= b; if (a > (1.0f) / b) { return a - (0.5f) / b; } else if (a < (-1.0f) / b) { return -a - (0.5f) / b; } else { return (0.5f) * a * a * b; } } }; // struct smooth_l1_loss struct smooth_l1_gradient { // a is x, b is sigma2 template MSHADOW_XINLINE static DType Map(DType a, DType b) { b *= b; if (a > (1.0f) / b) { return (1.0f); } else if (a < (-1.0f) / b) { return DType(-1.0f); } else { return b * a; } } }; // struct smooth_l1_derivative ``` Below is the whole chunk of warning and error messages, ``` src/operator/tensor/./../mshadow_op.h(1093): warning: floating-point value does not fit in required integral type detected during: instantiation of "DType mxnet::op::mshadow_op::smooth_l1_gradient::Map(DType, DType) [with DType=uint8_t]" /home/hanfeng/zyh/build/mshadow/mshadow/././expr_engine-inl.h(131): here instantiation of "DType mshadow::expr::Plan, DType>::Eval(mshadow::index_t, mshadow::index_t) const [with OP=mxnet::op::mshadow_op::smooth_l1_gradient, TA=mshadow::Tensor, TB=mshadow::expr::ScalarExp, etype=1, DType=uint8_t]" /home/hanfeng/zyh/build/mshadow/mshadow/././expr_engine-inl.h(131): here instantiation of "DType mshadow::expr::Plan, DType>::Eval(mshadow::index_t, mshadow::index_t) const [with OP=mshadow::op::mul, TA=mshadow::Tensor, TB=mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, etype=1, DType=uint8_t]" /home/hanfeng/zyh/build/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh(75): here instantiation of "void mshadow::cuda::MapPlanProc(DstPlan, mshadow::index_t, mshadow::Shape<2>, Plan, int) [with Saver=mshadow::sv::saveto, DstPlan=mshadow::expr::Plan, uint8_t>, Plan=mshadow::expr::Plan, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, uint8_t>, block_dim_bits=8]" /home/hanfeng/zyh/build/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh(83): here instantiation of "void mshadow::cuda::MapPlanKernel(DstPlan, mshadow::index_t, mshadow::Shape<2>, Plan) [with Saver=mshadow::sv::saveto, block_dim_bits=8, DstPlan=mshadow::expr::Plan, uint8_t>, Plan=mshadow::expr::Plan, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, uint8_t>]" /home/hanfeng/zyh/build/mshadow/mshadow/././././cuda/tensor_gpu-inl.cuh(109): here instantiation of "void mshadow::cuda::MapPlan(mshadow::expr::Plan, const mshadow::expr::Plan &, mshadow::Shape<2>, cudaStream_t) [with Saver=mshadow::sv::saveto, DstExp=mshadow::Tensor, E=mshadow::expr::BinaryMapExp, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, DType=uint8_t]" /home/hanfeng/zyh/build/mshadow/mshadow/./tensor_gpu-inl.h(115): here instantiation of "void mshadow::MapExp(mshadow::TRValue *, const mshadow::expr::Exp &) [with Saver=mshadow::sv::saveto, R=mshadow::Tensor, dim=1, DType=uint8_t, E=mshadow::expr::BinaryMapExp, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, etype=1]" /home/hanfeng/zyh/build/mshadow/mshadow/././expr_engine-inl.h(446): here instantiation of "void mshadow::expr::ExpEngine::Eval(RV *, const mshadow::expr::Exp &) [with SV=mshadow::sv::saveto, RV=mshadow::Tensor, DType=uint8_t, E=mshadow::expr::BinaryMapExp, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>]" /home/hanfeng/zyh/build/mshadow/mshadow/./expression.h(167): here instantiation of "Container &mshadow::expr::RValueExp::__assign(const mshadow::expr::Exp &) [with Container=mshadow::Tensor, DType=uint8_t, E=mshadow::expr::BinaryMapExp, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, etype=1]" /home/hanfeng/zyh/build/mshadow/mshadow/tensor.h(609): here instantiation of "mshadow::Tensor &mshadow::Tensor::operator=(const mshadow::expr::Exp &) [with Device=mxnet::gpu, DType=uint8_t, E=mshadow::expr::BinaryMapExp, mshadow::expr::BinaryMapExp, mshadow::expr::ScalarExp, uint8_t, 1>, uint8_t, 1>, etype=1]" src/operator/tensor/./elemwise_binary_scalar_op.h(288): here instantiation of "void mxnet::op::BinaryScalarOp::
[GitHub] zhangqianghd commented on issue #8309: asnumpy is slowly ,how can I speed up it?
zhangqianghd commented on issue #8309: asnumpy is slowly ,how can I speed up it? URL: https://github.com/apache/incubator-mxnet/issues/8309#issuecomment-337780811 When I add wait_to_read,I have found the detail . Thanks All. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhangqianghd closed issue #8309: asnumpy is slowly ,how can I speed up it?
zhangqianghd closed issue #8309: asnumpy is slowly ,how can I speed up it? URL: https://github.com/apache/incubator-mxnet/issues/8309 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] rahul003 opened a new pull request #8342: [WIP] 2bit gradient compression
rahul003 opened a new pull request #8342: [WIP] 2bit gradient compression URL: https://github.com/apache/incubator-mxnet/pull/8342 ## Description ## Implements 2bit gradient compression by quantizing each value in gradient array to 2bits using two user specified thresholds, one for positive and one for negative values. @eric-haibin-lin @piiswrong @reminisce @anirudh2290 @bhavinthaker @madjam @cjolivier01 Please review. This is a work in progress. I'm currently running this with different kind of models to get performance results. ### Important files to review Operator - two_bit_quantize-inl.h - two_bit_quantize.cc KVStore local - comm.h KVStore dist - kvstore_dist.h - kvstore_dist_server.h Documentation about gradient compression - kvstore.py - two_bit_quantize.cc ## Checklist ## ### Essentials ### - [ ] Passed code style checking (`make lint`) - [ ] Changes are complete (i.e. I finished coding on this PR) - [ ] All changes have test coverage - [ ] For user-facing API changes, API doc string has been updated. - [ ] To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [ ] two-bit-quantize and dequantize operators - [ ] Reduce operation in kvstore_local / comm.h - [ ] Distributed kvstore changes at worker and server - [ ] Tests for operator, local kvstore, distributed kvstore with predefined and random data. The results have been compared with expected values by implementing this logic in python. - [ ] API changes for Kvstore, Module and Trainer in python ## Comments ## ### Problem When training large scale deep learning models especially with distributed training, communication becomes a bottleneck for networks whose computation is not high compared to the communication. ### Approach We can try to address this by quantizing the gradients before sending and dequantizing it at the receiver's end. The sender would retain the quantization error and add it to the next iteration, effectively delaying small updates to positions in the gradient. Specifically in this PR, currently 2bit quantization has been implemented. ### Two bit quantization Use two thresholds to quantize the data, one positive threshold and one negative threshold. Any positive value greater than or equal to the positive threshold is set to one value (say 01), any negative value lower than or equal to the negative threshold is set to second value (say 10), and others are set to third value (say 0). We need three values to represent data in this fashion and hence two bits. We understand this leads to one bit going waste, but that's an optimization for later, as it complicates the operators. The error in quantization is stored as residual and carried over to the next iteration. This is added in the next iteration to the gradient before quantizing. An example below with thresholds of -2.0 and 2.0 ![Quantization at work](https://i.imgur.com/AtBVg92.png) ### Format of compressed gradient The first two elements are the thresholds used for quantization. The third element is the size of the original array. These values are required to dequantize the gradient. Any element from the 4th element, represents compressed gradient. Each value from the 4th element, represents upto 16 elements in the original array. For the example above, we get ```compr = [ -2.0, 2.0, 8, 6.1215606E-28]``` Note that the binary representation of the last element is ```00 01 00 10 01 00 00 10 ``` ### Local kvstore When using local kvstore, gradients compression only happens when using device communication. When gradients are pushed, before summing them up (Reduce), quantization and dequantization happen. Example: Say we have 4 GPUs, and the gradients are being summed up on GPU0. Each device quantizes gradients, then sends quantized gradient to GPU0, which performs dequantization of this data before merging it with values from other GPUs. Note that here, there is no need to quantize gradients from GPU0 itself, but it is still being done so that there is no bias for the samples which were processed by GPU0. **Please let me know if this is not a good idea.** ### Dist kvstore When the set_compress method for kvstore is called, each worker sets those compress params and one worker sends these params to all servers. From then on, when before each value is pushed to the server, it is quantized. The server dequantizes the data and stores it as an array of the original size. When values are pulled from the server, it returns an array of the original size. The same happens when each server is handling shards of the data. ### Usage The reason I used a dictionary compress_params for the arguments was to ensure uniformity when we extend this
[GitHub] kaisark commented on issue #6846: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
kaisark commented on issue #6846: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function URL: https://github.com/apache/incubator-mxnet/issues/6846#issuecomment-337768151 I was able to build mxnet on my TX1 with gpu today. I did add 53 in the Makefile for thoroughness. https://mxnet.incubator.apache.org/get_started/install.html Devices -> Nvidia Jetson TX2 Linux -> Python -> GPU -> Build from Source nvidia@tegra-ubuntu:~/mxnet/python$ uname -a Linux tegra-ubuntu 4.4.38-jetsonbot-doc-v0.3 #1 SMP PREEMPT Thu Oct 5 15:58:24 EDT 2017 aarch64 aarch64 aarch64 GNU/Linux nvidia@tegra-ubuntu:~/mxnet/python$ make USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 nvidia@tegra-ubuntu:~/mxnet/python$ pip list -e mxnet (0.11.1, /home/nvidia/mxnet/python) nvidia@tegra-ubuntu:~/mxnet/python$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import mxnet as mx >>> a = mx.nd.ones((2, 3), mx.gpu()) >>> b = a * 2 + 1 >>> b.asnumpy() array([[ 3., 3., 3.], [ 3., 3., 3.]], dtype=float32) >>> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] liuzhi136 opened a new issue #8341: Training error always fluctuates and doesn't decrease.
liuzhi136 opened a new issue #8341: Training error always fluctuates and doesn't decrease. URL: https://github.com/apache/incubator-mxnet/issues/8341 I had implemented a model which defined in "Training RNNs as Fast as CNNs". The structure I wrote as below: ![1](https://user-images.githubusercontent.com/13534043/31749188-fd014a78-b4aa-11e7-82e5-10732df9cb2a.png) ![2](https://user-images.githubusercontent.com/13534043/31749189-fd3d7e30-b4aa-11e7-95a2-89fc72078226.png) ![3](https://user-images.githubusercontent.com/13534043/31749190-fd72ba0a-b4aa-11e7-95cc-1145d1da7b5c.png) The training and validation error looks like: ![screenshot from 2017-10-19 08-50-15](https://user-images.githubusercontent.com/13534043/31749202-149d4f9c-b4ab-11e7-8f80-7b7ea7a1cd0f.png) The training data I used: https://raw.githubusercontent.com/harvardnlp/sent-conv-torch/master/data/rt-polarity.all I really don't know why this model fluctuates all the time. Does any has any idea for this? I really need help to solve this immediately. Any help for this will be appreciated! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaisark commented on issue #6846: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function
kaisark commented on issue #6846: Check failed: err == cudaSuccess (8 vs. 0) Name: MapPlanKernel ErrStr:invalid device function URL: https://github.com/apache/incubator-mxnet/issues/6846#issuecomment-337768151 I was able to build mxnet on my TX1 with gpu today. I did add 53 in the Makefile for thoroughness. https://mxnet.incubator.apache.org/get_started/install.html Devices -> Nvidia Jetson TX2 Linux -> Python -> GPU -> Build from Source nvidia@tegra-ubuntu:~/mxnet/python$ uname -a Linux tegra-ubuntu 4.4.38-jetsonbot-doc-v0.3 #1 SMP PREEMPT Thu Oct 5 15:58:24 EDT 2017 aarch64 aarch64 aarch64 GNU/Linux nvidia@tegra-ubuntu:~/mxnet/python$ make USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 nvidia@tegra-ubuntu:~/mxnet/python$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import mxnet as mx >>> a = mx.nd.ones((2, 3), mx.gpu()) >>> b = a * 2 + 1 >>> b.asnumpy() array([[ 3., 3., 3.], [ 3., 3., 3.]], dtype=float32) >>> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] VikingMew commented on issue #7582: Gluon GPU memory efficiency
VikingMew commented on issue #7582: Gluon GPU memory efficiency URL: https://github.com/apache/incubator-mxnet/issues/7582#issuecomment-337760280 @jermainewang How to do (2)? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vinig opened a new issue #8231: [MXNet 0.11.0 + RPi 3 + Python 2.7] ndarray unit test fails
vinig opened a new issue #8231: [MXNet 0.11.0 + RPi 3 + Python 2.7] ndarray unit test fails URL: https://github.com/apache/incubator-mxnet/issues/8231 ## Description I'm trying to build MXNet (0.11.0) for Raspberry Pi 3 with Python 2.7, OpenBLAS, OPENCV and LAPACK. (cross-compiled MXNet on RHEL) When I run unit tests (tests/python/unittest), test_ndarray.test_ndarray_slice fails for AssertionError (check Error Message section). I upgraded numpy and scipy version, since debian package manager was installing older versions, which were not compatible with tests. Current numpy version is 1.13.3 and scipy version is 1.19.1. Version upgrade resolved other unit tests failures except this one. It is strange because none of the functionality is broken but the arrays are different. (check the last section) How is that happening? My question is what is the correct set of versions for various dependencies to build and use MXNet for RPi 3? My aim is to get all the unit tests working for the MXNet version 0.11.0 on RPi 3. ## Environment info ``` --Python Info-- ('Version :', '2.7.9') ('Compiler :', 'GCC 4.9.2') ('Build:', ('default', 'Sep 17 2016 20:26:04')) ('Arch :', ('32bit', 'ELF')) Pip Info--- ('Version :', '1.5.6') ('Directory:', '/usr/lib/python2.7/dist-packages/pip') --MXNet Info--- ('Version :', '0.11.0') ('Directory:', '/usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet') Traceback (most recent call last): File "diagnose.py", line 108, in check_mxnet with open(commit_hash, 'r') as f: IOError: [Errno 2] No such file or directory: '/usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/COMMIT_HASH' --System Info-- ('Platform :', 'Linux-4.9.35-v7+-armv7l-with-debian-8.0') ('system :', 'Linux') ('node :', 'raspberrypi') ('release :', '4.9.35-v7+') ('version :', '#1014 SMP Fri Jun 30 14:47:43 BST 2017') --Hardware Info-- ('machine :', 'armv7l') ('processor:', '') Architecture: armv7l Byte Order:Little Endian CPU(s):4 On-line CPU(s) list: 0-3 Thread(s) per core:1 Core(s) per socket:4 Socket(s): 1 Model name:ARMv7 Processor rev 4 (v7l) CPU max MHz: 1200. CPU min MHz: 600. --Network Test-- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0101 sec, LOAD: 0.5146 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0095 sec, LOAD: 0.2694 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0456 sec, LOAD: 0.1679 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0166 sec, LOAD: 0.0695 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0106 sec, LOAD: 0.0516 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0467 sec, LOAD: 0.2191 sec. ``` Package used (Python/R/Scala/Julia): Python ## Build info Compiler (gcc/clang/mingw/visual studio): gcc version 4.9.2 (Raspbian 4.9.2-10) (target: arm-linux-gnueabihf) MXNet commit hash: a5edbf94094581ee27157eae4f2113115a3994e7 Build config: OpenBLAS build on Pi, installed and ported to RHEL for cross-compilation: make FC=gfortran -j4 config.mk USE_PROFILER=1 ADD_LDFLAGS=-L/path/to/openblas/ext/lib /path/to/static/libopenblas.a ADD_CFLAGS=-I/path/to/openblas/ext/include USE_BLAS=openblas USE_OPENCV=1 USE_OPENMP=0 USE_LAPACK=1 USE_LAPACK_PATH=/path/to/lapack/static/lib MXNet installation depends on following libraries: librt.so.1 libopencv_dnn.so.3.3 libopencv_ml.so.3.3 libopencv_objdetect.so.3.3 libopencv_shape.so.3.3 libopencv_stitching.so.3.3 libopencv_superres.so.3.3 libopencv_videostab.so.3.3 libopencv_calib3d.so.3.3 libopencv_features2d.so.3.3 libopencv_highgui.so.3.3 libopencv_videoio.so.3.3 libopencv_imgcodecs.so.3.3 libopencv_video.so.3.3 libopencv_photo.so.3.3 libopencv_imgproc.so.3.3 libopencv_flann.so.3.3 libopencv_core.so.3.3 libstdc++.so.6 libm.so.6 libgcc_s.so.1 libpthread.so.0 libc.so.6 ld-linux-armhf.so.3 libopenblas.a liblapack.a ## Error Message: ``` == FAIL: test_ndarray.test_ndarray_slice -- Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/pi/deepgreen/mxnet/tests
[GitHub] szha commented on issue #8340: Fill optimizations
szha commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337758666 As long as `full` is still on the radar it's fine. Would you make the change to properly support `full` in that PR then? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #8340: Fill optimizations
szha commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337758666 As long as `full` is still on the radar it's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on issue #8340: Fill optimizations
cjolivier01 commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337758247 You can re-add it if you need it sometime, although it's better to use OpBase::SetToScalar or op_with_req (op_with_req override Map() for setting a scalar is in a separate PR) because those properly handle Req This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ykim362 commented on issue #7931: MKL-DNN integration: request for reviews
ykim362 commented on issue #7931: MKL-DNN integration: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7931#issuecomment-337757546 @szha @piiswrong MKL-DNN doesn't support fp64(double) data type. Do you think this is an issue? The library team is more focusing on adding lower precisions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] moakra commented on issue #8248: A3C code does not learn
moakra commented on issue #8248: A3C code does not learn URL: https://github.com/apache/incubator-mxnet/issues/8248#issuecomment-337756962 I was talking about this code: https://github.com/apache/incubator-mxnet/tree/master/example/reinforcement-learning/a3c Now it works. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] ykim362 commented on issue #7931: MKL-DNN integration: request for reviews
ykim362 commented on issue #7931: MKL-DNN integration: request for reviews URL: https://github.com/apache/incubator-mxnet/pull/7931#issuecomment-337756563 @piiswrong We have fixed convergence issue in Resnet. There were some problems in Conv and Batch norm layers. Also, we have added some more optimizations to get more speed-ups. Now, MKL-DNN version is 15% faster than MKLML (MKL2017) for inference and training on average. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #8340: Fill optimizations
szha commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337756930 Agreed. I'm basically asking to keep both `fill` and `set_to`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] moakra closed issue #8248: A3C code does not learn
moakra closed issue #8248: A3C code does not learn URL: https://github.com/apache/incubator-mxnet/issues/8248 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on issue #8340: Fill optimizations
cjolivier01 commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337756128 Making a fill with some runtime-determined value such as OpBase::SetToScalar is a trivial addition. For filling with a predefined constant scalar such as zeroes and ones, using an immediate such as with set_to is faster. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #8340: Fill optimizations
szha commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337753653 There's an op in NDArray which can benefit from fill https://mxnet.incubator.apache.org/versions/master/api/python/ndarray.html?highlight=full#mxnet.ndarray.full. Currently it is done in the frontend through `empty` and in-place assignment. And this op is not included in symbol (though it's doable through `ones` and multiply). It's worth considering the `full` use case in this PR since doing it with fill will be faster, though it wouldn't be compatible with the current template implementation of `set_to`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cjolivier01 commented on issue #8340: Fill optimizations
cjolivier01 commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337752595 There's more than one thing you could be referring to in this PR. There's a FillCompute() and a set_to template, which are somewhat independent. Currently, the two use-cases are zero and one for the fill value for both of these. This does not mean that "going forward" another value can't be used if the need arises. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] szha commented on issue #8340: Fill optimizations
szha commented on issue #8340: Fill optimizations URL: https://github.com/apache/incubator-mxnet/pull/8340#issuecomment-337751362 Having the value to fill as the template parameter. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services