ibsidorenko commented on code in PR #13854:
URL: https://github.com/apache/tvm/pull/13854#discussion_r1092971982


##########
python/tvm/relay/qnn/op/legalizations.py:
##########
@@ -340,6 +340,62 @@ def helper_change_dtypes_to_int8(attrs, inputs, types, 
relay_op):
     )
 
 
+def helper_change_dtypes_to_uint8(attrs, inputs, types, relay_op):
+    """Helper function to change dtypes to uint8 x uint8.
+    Legalizes QNN dense op for Hexagon DSP. It supports fast u8 x u8 vrmpy 
instruction.
+
+    Converting from int8 to uint8 can be done in following manner:
+
+    Original equation
+      scale * (QA - zp_a)
+      scale * (QA + 128 - 128 - zp_a)
+      scale * ( (QA + 128) - (zp_a + 128))
+
+    Replacing QA + 128 with QA' and (zp_a + 128) with zp_a'
+    We get our new quantized uint8 tensor - scale * (QA' - zp_a')
+
+    Parameters
+    ----------
+    attrs : tvm.ir.Attrs
+        Attributes of current convolution
+    inputs : list of tvm.relay.Expr
+        The args of the Relay expr to be legalized
+    types : list of types
+        List of input and output types
+
+    Returns
+    -------
+    result : tvm.relay.Expr
+        The legalized expr
+    """
+    # Collect the dtypes.
+    data_dtype = types[0].dtype
+    kernel_dtype = types[1].dtype
+
+    # Do nothing since it is already uint8.
+    if data_dtype == "uint8" and kernel_dtype == "uint8":
+        return None
+
+    # Collect the input exprs.
+    data, kernel, input_zero_point, kernel_zero_point, input_scale, 
kernel_scale = inputs
+
+    # Shift input if necessary.
+    if data_dtype == "int8":
+        # Compute (QA + 128) and (zp_a + 128)
+        data, input_zero_point = _shift(data, input_zero_point, "uint8")
+
+    # Shift kernel if necessary.
+    if kernel_dtype == "int8":
+        # Compute (QA + 128) and (zp_a + 128)
+        kernel, kernel_zero_point = _shift(kernel, kernel_zero_point, "uint8")
+

Review Comment:
   Main goal of conversion int8 --> uint8 is to enable using of faster vrmpy 
u8**u8**i32 intrinsic instead of vrmpy u8**i8**i32 for `qnn.dense`/`qnn.conv2d`
   Yes, you're absolutely right. We have some overhead on such conversion. 
That's why I have commented out this code and skip i8 -> u8 conversion for 
weights. Right now I do not see any performance improvement due to overhead.
   I will enable this code if I will manage to get better performance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to