[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-06-13 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-397011088
 
 
   1060292419


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-06-13 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-397011088
 
 
   seed 1060292419


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-06-13 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-396886126
 
 
   I have likely found the root cause of this problem, just so we don't 
duplicate resources on this one.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885128
 
 
   reproducible 100% with export MXNET_TEST_SEED=1688524483
   
   nosetests-3.4 -s -v test_operator_gpu.py:test_binary_op
   
   
   diff --git a/tests/python/unittest/test_operator.py 
b/tests/python/unittest/test_operator.py
   index 5d38222..04e880c 100644
   --- a/tests/python/unittest/test_operator.py
   +++ b/tests/python/unittest/test_operator.py
   @@ -1429,6 +1429,16 @@ def check_binary_op_backward(symbol, baseline, 
gen_data, rtol=1e-3, atol=1e-5):
y.forward(is_train=True)
y.backward([mx.nd.array(out)])
assert_allclose(y_1.asnumpy(), x_1, rtol=rtol, atol=atol)
   +z = np.abs(y_2.asnumpy() - x_2)
   +w = np.where(z>atol)
   +if w[0].size > 0:
   +print("d[0].shape: {} d[1].shape: {} baseline_grad2.shape: 
{}".format(d[0].shape, d[1].shape, baseline_grad2.shape))
   +print(w)
   +print(y_2[w])
   +print(x_2[w])
   +print(z[w])
   +print(d[0][w])
   +print(d[1][w])
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885128
 
 
   reproducible 100% with export MXNET_TEST_SEED=1688524483
   
   nosetests-3.4 -s -v test_operator_gpu.py:test_binary_op
   
   ```
   diff --git a/tests/python/unittest/test_operator.py 
b/tests/python/unittest/test_operator.py
   index 5d38222..04e880c 100644
   --- a/tests/python/unittest/test_operator.py
   +++ b/tests/python/unittest/test_operator.py
   @@ -1429,6 +1429,16 @@ def check_binary_op_backward(symbol, baseline, 
gen_data, rtol=1e-3, atol=1e-5):
y.forward(is_train=True)
y.backward([mx.nd.array(out)])
assert_allclose(y_1.asnumpy(), x_1, rtol=rtol, atol=atol)
   +z = np.abs(y_2.asnumpy() - x_2)
   +w = np.where(z>atol)
   +if w[0].size > 0:
   +print("d[0].shape: {} d[1].shape: {} baseline_grad2.shape: 
{}".format(d[0].shape, d[1].shape, baseline_grad2.shape))
   +print(w)
   +print(y_2[w])
   +print(x_2[w])
   +print(z[w])
   +print(d[0][w])
   +print(d[1][w])
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386885067
 
 
   I tried to increase the tolerance, but I found out one failure where the 
difference is much bigger than expected 0.28679015 . I think we should look 
deeper into this
   
   [-116.15162] <-input
   
   [-115.8648288] <- gradient
   [0.28679015] <- diff
   [0.8396868] <- a 
   [0.0020733]  <- b
   FAIL


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707
 
 
   I think the cause of this is that operator mod is using doubles to make the 
computation, while the test is forcing float32, also the modulo operator for 
floating point seems to give different results in GPU vs CPU. Why is fmod in 
cuda giving different results?
   
   According to table 7  here 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#introduction-cuda-dynamic-parallelism
   
   there should be no differences in fmod.
   
   
https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511
   
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402
   
   >>> np.double(1.68) % np.double(1.30123)
   0.378769983
   >>> np.float32(1.68) % np.float32(1.30123)
   0.37877
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707
 
 
   I think the cause of this is that operator mod is using doubles to make the 
computation, while the test is forcing float32, also the modulo operator for 
floating point seems to give different results in GPU vs CPU.
   
   
https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511
   
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402
   
   >>> np.double(1.68) % np.double(1.30123)
   0.378769983
   >>> np.float32(1.68) % np.float32(1.30123)
   0.37877
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] larroy commented on issue #9853: Flaky test_operator.test_binary_op

2018-05-06 Thread GitBox
larroy commented on issue #9853: Flaky test_operator.test_binary_op
URL: 
https://github.com/apache/incubator-mxnet/issues/9853#issuecomment-386875707
 
 
   I think the cause of this is that operator mod is using doubles to make the 
computation, while the test is forcing float32
   
   
https://github.com/apache/incubator-mxnet/blob/master/tests/python/unittest/test_operator.py#L1511
   
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L402
   
   This discrepancy could be the cause of difference between results.
   
   >>> np.double(1.68) % np.double(1.30123)
   0.378769983
   >>> np.float32(1.68) % np.float32(1.30123)
   0.37877
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services