adstraw commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696746574



##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -227,43 +228,49 @@
 
###################################################################################################
 # Array Packing
 # -------------
-# Another important trick is array packing. This trick is to reorder the 
storage dimension of the
-# array to convert the continuous access pattern on certain dimension to a 
sequential pattern after
+# Another important trick is array packing. This trick is to reorder the 
storage dimension of an
+# array to convert the continuous access pattern on its dimensions to a 
sequential pattern after
 # flattening.
 #
 # .. image:: 
https://github.com/dmlc/web-data/raw/main/tvm/tutorial/array-packing.png
 #      :align: center
 #
+# NOTE: The figure above is meant for illustration purposes only.  Please 
ignore dimension
+# information ([16][16] and [16/4][16][4]) which does not apply to this 
tutorial.
 
 
 
###################################################################################################
-# Just as it is shown in the figure above, after blocking the computations, we 
can observe the array
-# access pattern of B (after flattening), which is regular but discontinuous. 
We expect that after
-# some transformation we can get continuous access pattern. We can reorder a 
[16][16] array to
-# a [16/4][16][4] array, so that the access pattern of B will be sequential 
when grabing
-# the corresponding value from the packed array.
-#
+# We can use array packing to address the access pattern for B. Observe the 
array access pattern of
+# B after flattening which is not sequential as we iterate over the K 
dimension. We can reorder B
+# with dimensions [K][N] so that it has dimensions [N/bn][K][bn] where bn is 
the blocking factor and
+# also the vector size for both B in the inner loop.  This reorder splits N 
into two dimensions ---
+# bigN (N/bn) and littleN (bn) --- and the new dimensions [N/bn][K][bn] match 
the indexing of B
+# from outer to inner loops (no, ko, ki, ni) resulting in a sequential access 
pattern for B after
+# flattening.
+
 
 # We have to re-write the algorithm slightly.
-packedB = te.compute((N / bn, K, bn), lambda x, y, z: B[y, x * bn + z], 
name="packedB")
+packedB = te.compute(
+    (N / bn, K, bn), lambda bigN, k, littleN: B[k, bigN * bn + littleN], 
name="packedB"
+)
 C = te.compute(
     (M, N),
-    lambda x, y: te.sum(A[x, k] * packedB[y // bn, k, tvm.tir.indexmod(y, 
bn)], axis=k),
+    lambda m, n: te.sum(A[m, k] * packedB[n // bn, k, tvm.tir.indexmod(n, 
bn)], axis=k),
     name="C",
 )
 
 s = te.create_schedule(C.op)
 
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
 (k,) = s[C].op.reduce_axis
-ko, ki = s[C].split(k, factor=4)
+ko, ki = s[C].split(k, factor=kfactor)
 
-s[C].reorder(xo, yo, ko, xi, ki, yi)
-s[C].vectorize(yi)
+s[C].reorder(mo, no, ko, mi, ki, ni)
+s[C].vectorize(ni)
 
-x, y, z = s[packedB].op.axis
-s[packedB].vectorize(z)
-s[packedB].parallel(x)
+bigN, k, littleN = s[packedB].op.axis

Review comment:
       I can either delete all lines `(k,) = s[C].op.reduce_axis` and just use 
the definition of `k` on line 101.  Or, I can change to something like 
`(kaxis,) = s[C].op.reduce_axis` and `ko, ki = s[CC].split(kaxis, 
factor=kfactor)`.  I am going with "delete" method for now but happy to code 
the other way.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@tvm.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to