slyforce opened a new issue #10397: grad_req in multi-task example URL: https://github.com/apache/incubator-mxnet/issues/10397 I'm referring to the python script in example/multi-task/example_multi_task.py As far as I understood gradients need to be accumulated between two loss functions, since you are performing a backward step on two different components onto a single symbol (in the example, f3 gets gradients from sm1 and sm2). However, the default for this example is that grad_req = 'write', implying that one softmax backward step gets overwritten by the next. Why is 'write' sufficient in this case? In which cases of gradient accumulation should 'add' be used?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services