I'm working on tensorizing kernels that does not have their input sizes
divisible by the dimension of the tensor intrinsic. An example of this
situation would be a 30x30x30 GEMM with a 16x16x16 tensor intrinsic. This
currently would result in the following lowered IR before tensorization and
error message when trying to tensorize:
```
produce C {
[35/1317]
for (i.outer, 0, 2) {
for (j.outer, 0, 2) {
for (i.inner.init, 0, 16) {
for (j.inner.init, 0, 16) {
if (likely((((i.outer*16) + i.inner.init) < 30))) {
if (likely((((j.outer*16) + j.inner.init) < 30))) {
C[((((i.outer*480) + (i.inner.init*30)) + (j.outer*16)) +
j.inner.init)] = (int8)0
}
}
}
}
for (k.outer, 0, 2) {
for (i.inner, 0, 16) {
for (j.inner, 0, 16) {
for (k.inner, 0, 16) {
if (likely((((i.outer*16) + i.inner) < 30))) {
if (likely((((j.outer*16) + j.inner) < 30))) {
if (likely((((k.outer*16) + k.inner) < 30))) {
if (likely((((i.outer*16) + i.inner) < 30))) {
if (likely((((j.outer*16) + j.inner) < 30))) {
if (likely((((k.outer*16) + k.inner) < 30))) {
C[((((i.outer*480) + (i.inner*30)) + (j.outer*16)) +
j.inner)] = (C[((((i.outer*480) + (i.inner*30)) + (j.outer*16)) + j.inner)] + (
A[((((i.outer*480) + (i.inner*30)) + (k.outer*16)) +
k.inner)]*B[((((k.outer*480) + (k.inner*30)) + (j.outer*16)) + j.inner)]))
}
}
}
}
}
}
}
}
}
}
}
}
}
```
```
Traceback (most recent call last):
File "padding-spike.py", line 197, in <module>
print(tvm.lower(*op, simple_mode=True))
File "/home/jsteward/work/tvm/python/tvm/build_module.py", line 382, in lower
stmt = form_body(sch)
File "/home/jsteward/work/tvm/python/tvm/build_module.py", line 333, in
form_body
stmt = schedule.ScheduleOps(sch, bounds)
File "/home/jsteward/work/tvm/python/tvm/_ffi/_ctypes/function.py", line 207,
in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (7) /home/jsteward/work/tvm/build/libtvm.so(TVMFuncCall+0x5f)
[0x7f6418c18a7f]
[bt] (6) /home/jsteward/work/tvm/build/libtvm.so(+0x40341b) [0x7f641840241b]
[bt] (5)
/home/jsteward/work/tvm/build/libtvm.so(tvm::schedule::ScheduleOps(tvm::Schedule,
tvm::Map<tvm::IterVar, tvm::Range, void, void>, bool)+0x1fa1) [0x7f64187a5f41]
[bt] (4)
/home/jsteward/work/tvm/build/libtvm.so(tvm::schedule::MakePipeline(tvm::Stage
const&, std::unordered_map<tvm::IterVar, tvm::Range, std::hash<tvm::IterVar>,
std::equal_to<tvm::IterVar>, std::allocator<std::pair<tvm::IterVar const,
tvm::Range> > > const&, tvm::Stmt, bool)+0x5a) [0x7f64187a367a]
[bt] (3)
/home/jsteward/work/tvm/build/libtvm.so(tvm::ComputeOpNode::BuildProvide(tvm::Stage
const&, std::unordered_map<tvm::IterVar, tvm::Range, std::hash<tvm::IterVar>,
std::equal_to<tvm::IterVar>, std::allocator<std::pair<tvm::IterVar const,
tvm::Range> > > const&, bool) const+0x165) [0x7f64185cca05]
[bt] (2)
/home/jsteward/work/tvm/build/libtvm.so(tvm::MakeTensorize(tvm::ComputeOpNode
const*, tvm::Stage const&, std::unordered_map<tvm::IterVar, tvm::Range,
std::hash<tvm::IterVar>, std::equal_to<tvm::IterVar>,
std::allocator<std::pair<tvm::IterVar const, tvm::Range> > > const&,
bool)+0x263) [0x7f6418602d73]
[bt] (1)
/home/jsteward/work/tvm/build/libtvm.so(tvm::VerifyTensorizeLoopNest(tvm::ComputeOpNode
const*, tvm::Stage const&, tvm::ComputeLoopNest const&, unsigned long)+0xbd6)
[0x7f64185fefc6]
[bt] (0)
/home/jsteward/work/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43)
[0x7f64183b9473]
File "../src/op/tensorize.cc", line 147
TVMError: Tensorize failed, split condition likely(((i.inner + (i.outer*16)) <
30)) relies on var defined inside tensorize scope
```
Seems like `src/op/tensorize.cc` mandates that the init and main predicates
cannot contain variables defined inside the tensorization scope (in this case
`i.inner`):
```cpp
for (const Expr& pred : n.main_predicates) {
if (ir::ExprUseVar(pred, banned)) {
LOG(FATAL) << "Tensorize failed, split condition "
<< pred << " relies on var defined inside tensorize scope";
}
}
for (const Expr& pred : n.init_predicates) {
if (ir::ExprUseVar(pred, banned)) {
LOG(FATAL) << "Tensorize failed, split condition "
<< pred << " relies on var defined inside tensorize scope";
}
}
```
The problem is that my tensor intrinsic can handle the padding (needed in this
case) in hardware, in a similar manner as VTAs use their DMA engines to perform
padding for sparse padding. I think my tensor intrinsic would handle it
correctly if the `likely` clauses are just removed, but there doesn't seem to
be an apparent way. How to achieve this? Thanks in advance!
---
[Visit
Topic](https://discuss.tvm.ai/t/tensorize-with-non-divisible-split-factor/6504/1)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
here](https://discuss.tvm.ai/email/unsubscribe/13edb97ee2c640cc682842ba2c2bb302b717574bd6db0b957b586aed42c0abcd).