vexilligera commented on a change in pull request #16800: [Numpy] Add NumPy support for np.linalg.det and np.linalg.slogdet URL: https://github.com/apache/incubator-mxnet/pull/16800#discussion_r347139368
########## File path: src/operator/tensor/la_op-inl.h ########## @@ -921,6 +927,9 @@ struct det_backward { using namespace mshadow; using namespace mshadow::expr; using namespace mxnet_op; + if (dA.shape_.Size() == 0U) { + return; + } // compute inverse(A) and stores it to LU linalg_batch_det_backward_helper(LU, pivot, det, dA, DType(0), ctx); const_cast<Tensor<xpu, 3, DType>&>(dA) = broadcast_to(reshape(det * ddet, \ Review comment: @reminisce I believe the perpetrator to the flakiness on Windows is here. I was able to pinpoint the bug thanks to @haojin2 . I think this may corrupt memory sometimes. P.S. It's not exactly line 926, it is another line of almost identical code in "slogdet_backward" in the same file. In fact, operator "det" does not produce any error in my test. Later I decomposed that line into three separate calls to operators and it turned out that "broadcast_to" affects the flaky behavior. However, when it comes to the direct source of crash (usually access violation of a null pointer or ndim overflow), it is actually located in "LaOpDetBackward", where the access to "req" array is out of boundary. I still tend to believe that the bug is on a higher level and I must have done something silly, I'll have a closer look tomorrow. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services