mseeger commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0 URL: https://github.com/apache/incubator-mxnet/issues/8338#issuecomment-343413652 Hello, I'd like to help, but I just cannot reproduce the error (given it is still the same as the one described above). Here some points: - I'd need the full error output. If some template is expanded, would have to know for which ops in mshadow_op.h. An obvious thing to try is to revert things back to before my changes for these ops only - Why was the original issue closed? Something must have solved the problem, and now maybe this is just a different problem? At least, we need error outputs for the new problem, and maybe this should be a different issue? - What is your hunch what is happening here? What cc or cu code is actually built here? My hunches of what to try: - The new math_functions-inl.h introduces a lot of functions exp, log, ... in the namespace mxnet::op::math. Maybe this clashes with something else you are doing? One thing to try would be to rename the namespace math => math_debug. This also needs mshadow_op.h changes, but nowhere else (I think) - Figure out exactly which ops in mshadow_op.h are implicated, and then change their code back to older versions - To explain what my changes did: The mshadow_op.h ops now consistently cast inputs to float for all DType != double, but leave them double for DType = double. The computation is then done in float or double, and if DType != double, we cast back to DType at the end. The input is cast with static_cast<float>(a), the result is cast back with DType(result). - The code before my changes did things differently: The forward ops were always computed in float, also if DType = double. Also, they would do ::expf(a) instead of ::expf(static_cast<float>(a)), which should be the same. More serious, the gradient (backward) ops were often doing computations in DType if only arithmetic was involved (*, +, even /), but in float if math.h functions were involved. This is plain wrong. In my changes, also all gradient ops cast to float, whether math.h functions are involved or not @zhreshold How would this fail in CI in windows, etc., if it does not fail for all other PRs that passed CI since my changes? This makes no sense, right? If CI fails for you, please provide some full error outputs. BTW: CI fails randomly for many reasons all the time, unfortunately. Make sure it fails with the errors you are reporting here (and please, re-report them).
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services