eric-haibin-lin commented on a change in pull request #16408: Add MXNet Ops for 
fast multihead attention
URL: https://github.com/apache/incubator-mxnet/pull/16408#discussion_r337861997
 
 

 ##########
 File path: src/operator/contrib/transformer-inl.h
 ##########
 @@ -34,6 +34,18 @@
 namespace mxnet {
 namespace op {
 
+struct InterleavedMatMulParam : public dmlc::Parameter<InterleavedMatMulParam> 
{
+  int heads;
+  bool bwd_ignore_zero_init;
+  DMLC_DECLARE_PARAMETER(InterleavedMatMulParam) {
+    DMLC_DECLARE_FIELD(heads)
+    .describe("Set number of heads");
+    DMLC_DECLARE_FIELD(bwd_ignore_zero_init)
+    .describe("Make backward pass ignore AddTo and not init to 0.")
 
 Review comment:
   I don't think this one sentence explains this flag very well. When are users 
supposed to turn this on? Would it affect users who use gradient accumulation 
for self-attention? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to