cbalint13 opened a new pull request #5805:
URL: https://github.com/apache/incubator-tvm/pull/5805


   This PR adds ```nn.batch_flatten``` as quantizable layer.
   
   **Description**
   * ```nn/batch_flatten``` is commonly used before ```nn.dense``` in final 
layers.
   * Proposed PR allows it to be included in quantization process avoiding 
re-cast to ```float32```.
   
   **Outcome**
   * Before
   ```
     %19 = nn.max_pool2d(%18, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 
0, 0]) /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %20 = cast(%19, dtype="int8") /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %21 = annotation.stop_fusion(%20) /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %22 = cast(%21, dtype="float32") /* ty=Tensor[(1, 50, 4, 4), float32] */;
     %23 = multiply(%22, 0.0625f /* ty=float32 */) /* ty=Tensor[(1, 50, 4, 4), 
float32] */;
     %24 = nn.batch_flatten(%23) /* ty=Tensor[(1, 800), float32] */;
     %25 = nn.batch_flatten(%24) /* ty=Tensor[(1, 800), float32] */;
     %26 = nn.batch_flatten(%25) /* ty=Tensor[(1, 800), float32] */;
     %27 = nn.dense(%26, meta[relay.Constant][2] /* ty=Tensor[(512, 800), 
float32] */ /* ty=Tensor[(512, 800), float32] */, units=512) /* ty=Tensor[(1, 
512), float32] */;
     %28 = nn.relu(%27) /* ty=Tensor[(1, 512), float32] */;
     %29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), float32] */;
     %30 = nn.batch_flatten(%29) /* ty=Tensor[(1, 512), float32] */;
     nn.dense(%30, meta[relay.Constant][3] /* ty=Tensor[(10, 512), float32] */ 
/* ty=Tensor[(10, 512), float32] */, units=10) /* ty=Tensor[(1, 10), float32] */
   ```
   * After
   ```
     %19 = nn.max_pool2d(%18, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 
0, 0]) /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %20 = cast(%19, dtype="int8") /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %21 = annotation.stop_fusion(%20) /* ty=Tensor[(1, 50, 4, 4), int8] */;
     %22 = nn.batch_flatten(%21) /* ty=Tensor[(1, 800), int8] */;
     %23 = nn.batch_flatten(%22) /* ty=Tensor[(1, 800), int8] */;
     %24 = nn.batch_flatten(%23) /* ty=Tensor[(1, 800), int8] */;
     %25 = clip(%24, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 800), int8] */;
     %26 = nn.dense(%25, meta[relay.Constant][2] /* ty=Tensor[(512, 800), int8] 
*/ /* ty=Tensor[(512, 800), int8] */, units=512, out_dtype="int32") /* 
ty=Tensor[(1, 512), int32] */;
     %27 = nn.relu(%26) /* ty=Tensor[(1, 512), int32] */;
     %28 = nn.batch_flatten(%27) /* ty=Tensor[(1, 512), int32] */;
     %29 = nn.batch_flatten(%28) /* ty=Tensor[(1, 512), int32] */;
     %30 = add(%29, 512 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %31 = right_shift(%30, 10 /* ty=int32 */) /* ty=Tensor[(1, 512), int32] */;
     %32 = clip(%31, a_min=-127f, a_max=127f) /* ty=Tensor[(1, 512), int32] */;
     %33 = cast(%32, dtype="int8") /* ty=Tensor[(1, 512), int8] */;
     %34 = nn.dense(%33, meta[relay.Constant][3] /* ty=Tensor[(10, 512), int8] 
*/ /* ty=Tensor[(10, 512), int8] */, units=10, out_dtype="int32") /* 
ty=Tensor[(1, 10), int32] */;
   ```
   @vinx13, @siju-samuel @masahi @FrozenGene @ZihengJiang please help with the 
review.
   
   Thank You !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to