[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482569369 @pengzhao-intel forgot to say thank you. Thank you! =D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482551962 @pengzhao-intel ``` [DEBUG] 1 of 1: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=796240428 to reproduce. ok -- Ran 1 test in 159.016s OK ``` I'll close my skip_test PR and post my fix test PR =) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482495472 @haojin2 no good: ``` == FAIL: test_gluon_rnn.test_layer_bidirectional -- Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest self.test(*self.arg) File "/work/mxnet/tests/python/unittest/common.py", line 110, in test_new orig_test(*args, **kwargs) File "/work/mxnet/tests/python/unittest/common.py", line 177, in test_new orig_test(*args, **kwargs) File "/work/mxnet/tests/python/unittest/test_gluon_rnn.py", line 283, in test_layer_bidirectional assert_allclose(net(data).asnumpy(), ref_net(data).asnumpy(), rtol=2e-7) File "/usr/local/lib/python3.5/dist-packages/numpy/testing/_private/utils.py", line 1452, in assert_allclose verbose=verbose, header=header, equal_nan=equal_nan) File "/usr/local/lib/python3.5/dist-packages/numpy/testing/_private/utils.py", line 789, in assert_array_compare raise AssertionError(msg) AssertionError: Not equal to tolerance rtol=2e-07, atol=0 (mismatch 0.06493506493507084%) x: array([0.424288, 0.560531, 0.600333, ..., 0.402131, 0.560952, 0.505039], dtype=float32) y: array([0.424288, 0.560531, 0.600333, ..., 0.402131, 0.560952, 0.505039], dtype=float32) >> begin captured logging << tests.python.unittest.common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1305130208 to reproduce. - >> end captured logging << - -- Ran 1 test in 0.030s ``` What I did to the test code: ``` # Added import statement from tests.python.unittest.common import with_seed # Added with_seed decorator to test function @with_seed() def test_layer_bidirectional(): # Update rtol as suggested to the assertion statement assert_allclose(net(data).asnumpy(), ref_net(data).asnumpy(), rtol=2e-7) ``` To test the changes, I have a g3.8xlarge instance with nvidia drivers 418 and nvidia-docker: ``` # On host $ docker run -ti -v `pwd`:/work/mxnet mxnetcd/build.ubuntu_cpu_static /bin/bash # Within container $ source tools/staticbuild/build.sh cu92mkl pip $ exit # On host $ docker run -ti --runtime=nvidia -v `pwd`:/work/mxnet mxnetcd/build.ubuntu_gpu_cu92 /bin/bash $ export PYTHONPATH=./python/ $ MXNET_TEST_COUNT=1 nosetests --logging-level=DEBUG --verbose -s tests/python/unittest/test_gluon_rnn.py:test_layer_bidirectional ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482446377 @haojin2 I'll give it a go, and let you know how it goes. Thanks for the help! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482445402 @szha in the two cases I've linked to, it was tested against a binary compiled with your tools for static linking, and the variants used were cu80mkl and cu90mkl. @haojin2 I'm happy to bump them, but I just wouldn't know what to bump them to =S I'm not familiar with this side of the code and don't really know what reasonable tolerance levels would be. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-mxnet] perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional
perdasilva commented on issue #13103: Flaky test test_gluon_rnn.test_layer_bidirectional URL: https://github.com/apache/incubator-mxnet/issues/13103#issuecomment-482086093 I'm currently working on some CD pipelines and I'm seeing this issue crop up: http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-static-binary-cu80mkl-release/detail/mxnet-static-binary-cu80mkl-release/4/pipeline http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-static-binary-cu92mkl-release/detail/mxnet-static-binary-cu92mkl-release/11/pipeline I'll create a quick PR to disable it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services