[GitHub] mseeger commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0

GitBox Fri, 10 Nov 2017 00:50:32 -0800

mseeger commented on issue #8338: master branch cannot build on centos 7 with 
cuda-8.0
URL: 
https://github.com/apache/incubator-mxnet/issues/8338#issuecomment-343413652
 
 
   Hello,
   I'd like to help, but I just cannot reproduce the error (given it is still 
the same as the one described above). Here some points:
   
   - I'd need the full error output. If some template is expanded, would have 
to know for which
     ops in mshadow_op.h. An obvious thing to try is to revert things back to 
before my changes
     for these ops only
   - Why was the original issue closed? Something must have solved the problem, 
and now maybe
      this is just a different problem? At least, we need error outputs for the 
new problem, and maybe
      this should be a different issue?
   - What is your hunch what is happening here? What cc or cu code is actually 
built here?
   
   My hunches of what to try:
   - The new math_functions-inl.h introduces a lot of functions exp, log, ... 
in the namespace
      mxnet::op::math. Maybe this clashes with something else you are doing? 
One thing to try
      would be to rename the namespace math => math_debug. This also needs 
mshadow_op.h
      changes, but nowhere else (I think)
   - Figure out exactly which ops in mshadow_op.h are implicated, and then 
change their code
      back to older versions
   
   - To explain what my changes did: The mshadow_op.h ops now consistently cast 
inputs to
      float for all DType != double, but leave them double for DType = double. 
The computation
      is then done in float or double, and if DType != double, we cast back to 
DType at the end.
      The input is cast with static_cast<float>(a), the result is cast back 
with DType(result).
   - The code before my changes did things differently:
      The forward ops were always computed in float, also if DType = double. 
Also, they would
       do ::expf(a) instead of ::expf(static_cast<float>(a)), which should be 
the same.
       More serious, the gradient (backward) ops were often doing computations 
in DType if
       only arithmetic was involved (*, +, even /), but in float if math.h 
functions were involved.
       This is plain wrong. In my changes, also all gradient ops cast to float, 
whether math.h
       functions are involved or not
   
   @zhreshold How would this fail in CI in windows, etc., if it does not fail 
for all other PRs that passed CI since my changes? This makes no sense, right? 
If CI fails for you, please provide some full error outputs. BTW: CI fails 
randomly for many reasons all the time, unfortunately. Make sure it fails with 
the errors you are reporting here (and please, re-report them).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] mseeger commented on issue #8338: master branch cannot build on centos 7 with cuda-8.0

Reply via email to