[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-562375617 > Hi @samskalicky and @rondogency , is it ready to merge this PR after CI passes? Yes! We're soo ready to merge :) Thanks @zachgk for reruning the unix_cpu job! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-543775483 pipeline timed out, for centos-cpu, had to push again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-543632838 More flaky test failures: ``` == FAIL: test_operator_gpu.test_bulking_operator_gpu -- Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in test_new orig_test(*args, **kwargs) File "/work/mxnet/tests/python/gpu/test_operator_gpu.py", line 2398, in test_bulking_operator_gpu _test_bulking(_test_bulking_in_process) File "/work/mxnet/tests/python/gpu/test_gluon_gpu.py", line 553, in _test_bulking time_per_iteration): File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 334, in run_in_spawned_process assert p.exitcode == 0, "Non-zero exit code %d from %s()." % (p.exitcode, func.__name__) AssertionError: Non-zero exit code -6 from _test_bulking_in_process(). >> begin captured logging << common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=298254377 to reproduce. - >> end captured logging << - ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-543570416 Thanks @wkcn, i ran into some flaky test failures and had to push another empty commit: ``` = FAIL: test_quantization_mkldnn.test_requantize_int32_to_int8 -- Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/usr/local/lib/python3.5/dist-packages/nose/util.py", line 620, in newfunc return func(*arg, **kw) File "/work/mxnet/tests/python/mkl/../unittest/common.py", line 177, in test_new orig_test(*args, **kwargs) File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 186, in test_requantize_int32_to_int8 check_requantize_with_symbol((3, 4, 10, 10)) File "/work/mxnet/tests/python/mkl/../quantization/test_quantization.py", line 181, in check_requantize_with_symbol assert_almost_equal(qdata_int8.asnumpy(), qdata_int8_np) File "/work/mxnet/python/mxnet/test_utils.py", line 624, in assert_almost_equal raise AssertionError(msg) AssertionError: Items are not equal: Error 1562.50 exceeds tolerance rtol=1.00e-05, atol=1.00e-20 (mismatch 0.08%). Location of maximum error: (2, 0, 9, 8), a=63., b=64. ACTUAL: array( 106, 17, -56, ..., 88, -9, -38], [ 107, -120, -49, ..., -78, 81, 93], [-100, -90, -17, ..., 84, 49, -118],... DESIRED: array( 106, 17, -56, ..., 88, -9, -38], [ 107, -120, -49, ..., -78, 81, 93], [-100, -90, -17, ..., 84, 49, -118],... >> begin captured stdout << - *** Maximum errors for vector of size 1200: rtol=1e-05, atol=1e-20 1: Error 1562.50 Location of error: (2, 0, 9, 8), a=63., b=64. == FAIL: test_operator_gpu.test_fast_lars -- Traceback (most recent call last): File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest self.test(*self.arg) File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 177, in test_new orig_test(*args, **kwargs) File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\test_operator_gpu.py", line 328, in test_fast_lars check_fast_lars(w_dtype, g_dtype, shapes, ctx, tol1, tol2) File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\test_operator_gpu.py", line 310, in check_fast_lars assert_almost_equal(ref_new_lrs.asnumpy(), mx_new_lrs.asnumpy(), atol=tol2, rtol=tol2) File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 624, in assert_almost_equal raise AssertionError(msg) AssertionError: Items are not equal: Error 2.844314 exceeds tolerance rtol=1.00e-06, atol=1.00e-06 (mismatch 1.724138%). Location of maximum error: (33,), a=0.01896909, b=0.01897199 ACTUAL: array([0.00013492, 0.00045911, 0.00021266, ..., 0.00068631, 0.0002449 , 0.00085557], dtype=float32) DESIRED: array([0.00013493, 0.00045911, 0.00021266, ..., 0.00068631, 0.0002449 , 0.00085556], dtype=float32) >> begin captured stdout << - *** Maximum errors for vector of size 58: rtol=1e-06, atol=1e-06 1: Error 2.844314 Location of error: (33,), a=0.01896909, b=0.01897199 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-543515556 @yzhliu any idea why this is failing? ``` Traceback (most recent call last): File "/work/mxnet/contrib/tvmop/compile.py", line 20, in import tvm File "/work/mxnet/3rdparty/tvm/python/tvm/__init__.py", line 23, in from . import tensor File "/work/mxnet/3rdparty/tvm/python/tvm/tensor.py", line 20, in from ._ffi.node import NodeBase, NodeGeneric, register_node, convert_to_node File "/work/mxnet/3rdparty/tvm/python/tvm/_ffi/node.py", line 24, in from .node_generic import NodeGeneric, convert_to_node, const File "/work/mxnet/3rdparty/tvm/python/tvm/_ffi/node_generic.py", line 23, in from .base import string_types File "/work/mxnet/3rdparty/tvm/python/tvm/_ffi/base.py", line 60, in _LIB, _LIB_NAME = _load_lib() File "/work/mxnet/3rdparty/tvm/python/tvm/_ffi/base.py", line 52, in _load_lib lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL) File "/usr/lib/python3.5/ctypes/__init__.py", line 347, in __init__ self._handle = _dlopen(self._name, mode) OSError: /work/build/3rdparty/tvm/libtvm.so: file too short ninja: build stopped: subcommand failed. ``` @wkcn can you please restart the unix-cpu job and see if it was transient? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-541983523 > Just talked with @szha and he found out a better way to make `MXTensor` not diverge from the de facto standard tensor format `DLTensor` -- we can have a `MXTensor` class which contains a `DLTensor` as its member. Just like [what TVM did](https://github.com/dmlc/tvm/blob/6b0359b440135b19116ded681be9bee0d7d4c985/include/tvm/runtime/ndarray.h#L242-L294) for its own tensor format. Thanks @junrushao1994 & @szha thats a good idea. I added the DLTensor to the list of features for the next custom Op PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-540242870 Note: the two failing CI jobs are related to numpy: sanity: ``` * Module mxnet.numpy_op_signature python/mxnet/numpy_op_signature.py:21:0: E0402: Attempted relative import beyond top-level package (relative-beyond-top-level) ``` unix-gpu: ``` Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1746065449 to reproduce. ERROR ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-539344942 > Hey, thank you guys for the nice work! BTW, would you mind if you guys clearly state why DLTensor is not adopted, which I believe would be useful for other community members for refenrece @wkcn (who implemented DLTensor support in MXNet) and I had a long discussion about this. In fact, we did investigate supporting DLTensor: https://github.com/samskalicky/incubator-mxnet/blob/custom_op/example/custom_op/test.cc#L14 One takeaway we had was that it would not be easy or convenient to modify the structure of DLPack. The reason is that DLPack is used commonly in multiple deep learning frameworks, and we should keep the consistence of DLPack. So building MXNet custom operators on top of DLTensor would limit our future extensibility of MXNet and custom operator support. One example of this would be adding a "layout" field to the tensor structure (ie. NCHW). This is something that I have heard as a request, but is [not currently something the DLPack community is willing to accept](https://github.com/dmlc/dlpack/pull/42). The MXTensor structure in this work is compatible with DLPack/DLTensor, so any user that wants to convert from MXTensor to DLTensor can do so by simply setting the fields in a DLTensor without copying data, and with a very small overhead. Just because we're not using DLTensor now, does not mean that we cannot support it directly in a future PR. If enough users want this feature, the work in this PR can easily be extended to simply pass DLTensors from MXNet to the custom operators in the external library. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] samskalicky commented on issue #15921: dynamic custom operator support
samskalicky commented on issue #15921: dynamic custom operator support URL: https://github.com/apache/incubator-mxnet/pull/15921#issuecomment-537741566 Failing centos-gpu job is failing with Numpy issue #16358 unrelated to this PR. We've sync'ed with the authors and they confirmed. Assume all CI jobs are passing at this point until they can address the issue This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services