The GitHub Actions job "Nightly Docker Update" on tvm.git/main has failed.
Run started by GitHub user areusch (triggered by areusch).

Head commit for run:
141431ce1a61e628bc507dfcf09243bb4bdeab0b / Asuka 
<[email protected]>
[TIR][Schedule] Fix mma tensorize error (#18528)

When forcing the use of MMA with MultiLevelTilingTensorCore or directly
applying tensorization via the script below, the required shared memory
size is significantly overestimated compared to the actual usage, at the
same time, the accumulated result of mma is also incorrect. This issue
stems from two root causes:

1. In `MmaToGlobal::Rewrite`, an extra threadIdx.x dimension is
introduced when calling InsertCacheStage, which confuses the memory
analysis and leads to inflated shared memory estimates.
2. In `get_mma_sync_intrin`, the offset computation for fragment C in
get_index_C is incorrect, resulting in erroneous accumulation results.

This PR addresses both issues to ensure accurate shared memory
estimation and correct tensor core accumulation behavior.

**How**
This PR includes the following fixes:

1. Skip the threadIdx.x dimension in `InsertCacheStage` when it is not
required, to prevent spurious shared memory overestimation and store
repeatedly.
2. Correct the offset calculation for fragment C in `get_index_C` to
ensure accurate accumulation results during tensor core execution.

**Result**
The above script produces results that match those of PyTorch.

**Env**
NVIDIA A100-SXM4-80GB

Report URL: https://github.com/apache/tvm/actions/runs/19947895983

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to