[GitHub] [incubator-mxnet] apeforest commented on issue #15120: [bug] fix higher grad log

GitBox Thu, 06 Jun 2019 11:17:22 -0700

apeforest commented on issue #15120: [bug] fix higher grad log 
URL: https://github.com/apache/incubator-mxnet/pull/15120#issuecomment-499608736
 
 
   @kshitij12345 I think it's because of the design of backward computation 
graph in MXNet. In C++ implementation, when you specify variables=x, it will 
compute gradients for the input variables.
   
   As in your case 2:
   ```
   x_grad = autograd.grad(heads=y, variables=x, head_grads=y_grad, 
create_graph=True, retain_graph=True)[0]
   ```
   If you perform another backward on x_grad as 
`x_grad.backward(out_grad=head_grads_grads)`, y_grad is not listed as input 
variable and therefore it's gradient is zero
   
   As in your case 1:
   ```
   x_grad = x_grad_mid * y_grad # Note
   x_grad.backward(out_grad=head_grad_grads)
   ```
   You implicitly made y_grad an input variable when calling backward on 
x_grad. And that is why you will get values in y_grad.grad.
   
   I replaced the `backward()` method with an explicit `autograd.grad()` call, 
which should call the same C++ backend function and result is different.
   
   case 1.1: if I do the following, I again don't get any values for y_grad 
because the output only contains one gradient variable
   ```
   out_grad = autograd.grad(heads=x_grad, variables=x, 
head_grads=head_grad_grads, create_graph=False, retain_graph=False)
   print(out_grad[0])   # values equals to expected_grad_grad
   ```
   
   case 1.2: I explicitly set y_grad as input variable, I then get the expected 
result as in your case 1
   ```
   out_grad = autograd.grad(heads=x_grad, variables=[x, y_grad], 
head_grads=head_grad_grads, create_graph=False, retain_graph=False)
   print(out_grad[0])   # value equals to expected_grad_grad
   print(out_grad[1])   # value equals to expected_heads_grad
   ```
   
   At this point, I am not sure if this is a bug because the backward API is 
designed differently from PyTorch. If y_grad is not specified as part of the 
input variables that need to perform gradient on, it will not get values 
assigned even if you write `y_grad.attach_grad()` to it. This seems to be 
consistent from API spec. Also, given that the value `y_grad` does not have 
real useful values, I also don't feel the necessity to store it. Please let me 
know if this makes sense. Thanks a lot for your careful drawing and insightful 
discussion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] apeforest commented on issue #15120: [bug] fix higher grad log

Reply via email to