The GitHub Actions job "Nightly Docker Update" on tvm.git/main has failed. Run started by GitHub user areusch (triggered by areusch).
Head commit for run: 141431ce1a61e628bc507dfcf09243bb4bdeab0b / Asuka <[email protected]> [TIR][Schedule] Fix mma tensorize error (#18528) When forcing the use of MMA with MultiLevelTilingTensorCore or directly applying tensorization via the script below, the required shared memory size is significantly overestimated compared to the actual usage, at the same time, the accumulated result of mma is also incorrect. This issue stems from two root causes: 1. In `MmaToGlobal::Rewrite`, an extra threadIdx.x dimension is introduced when calling InsertCacheStage, which confuses the memory analysis and leads to inflated shared memory estimates. 2. In `get_mma_sync_intrin`, the offset computation for fragment C in get_index_C is incorrect, resulting in erroneous accumulation results. This PR addresses both issues to ensure accurate shared memory estimation and correct tensor core accumulation behavior. **How** This PR includes the following fixes: 1. Skip the threadIdx.x dimension in `InsertCacheStage` when it is not required, to prevent spurious shared memory overestimation and store repeatedly. 2. Correct the offset calculation for fragment C in `get_index_C` to ensure accurate accumulation results during tensor core execution. **Result** The above script produces results that match those of PyTorch. **Env** NVIDIA A100-SXM4-80GB Report URL: https://github.com/apache/tvm/actions/runs/19947895983 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
