barry-jin edited a comment on issue #20293:
URL: 
https://github.com/apache/incubator-mxnet/issues/20293#issuecomment-881776555


   I don't think the root cause is in CachedOp. As I was debugging this issue, 
the elemwise_add is using 
[CloneGradient](https://github.com/apache/incubator-mxnet/blob/3480ba2c6df02bb907d3a975d354efa8697c4e71/src/operator/tensor/elemwise_binary_op_basic.cc#L111),
 which means copy ograds multiple times for the inputs. 
   
   For cached_op, if the static_alloc is on, then it will construct backward 
graph with grad_graph outputs
   
https://github.com/apache/incubator-mxnet/blob/3480ba2c6df02bb907d3a975d354efa8697c4e71/src/imperative/cached_op.cc#L270-L281
   In the case of elemwise_add(a, b), the grad_graph will be like this. 
   ![Screen Shot 2021-07-16 at 4 19 40 
PM](https://user-images.githubusercontent.com/69359374/126018275-a3d1505f-69ee-43b3-9897-2dbb15428c7b.png)
   The gradient of b will be the copy of the gradient of a. So there will be 
divergence between (case1: a.grad_req = null, b.grad_req = write) and (case2: 
a.grad_req = write, b.grad_req = null) when constructing the new graph based on 
the grad_graph. 
   
   From my point of view, the solution of this bug is to change the 
elemwise_add gradient function to this
   ```
   .set_attr<nnvm::FGradient>("FGradient",
     [](const nnvm::ObjectPtr& n, const std::vector<nnvm::NodeEntry>& ograds) {
       std::vector<nnvm::NodeEntry> ret;
       const size_t input_count = n->inputs.size();
       ret.reserve(input_count);
       for (size_t i = 0; i < input_count; ++i) {
         ret.emplace_back(MakeNode("ones_like", n->attrs.name + "_grad_ones", 
{n->inputs[i]}, nullptr, &n));
       }
       return ret;
   });
   ```
   @KexinFeng FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to