[tvm] branch unity-staging updated (18c19fb830 -> a425bc7a39)

tqchen Sat, 01 Apr 2023 09:38:10 -0700

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a change to branch unity-staging
in repository https://gitbox.apache.org/repos/asf/tvm.git



    omit 18c19fb830 [Unity][Frontend] FX exp and strided_slice fix (#14338)
    omit 6c6985940c [Unity][BYOC] Update testcases to follow recent changes 
(#14339)
    omit 06fe80be71 [Unity] Remove Python interface of RemoveUnusedFunction 
(#14336)
    omit a0bd29917c [Unity][Pass] Reuse prior infra to implement more complete 
DCE (#14334)
    omit 9aae685bf6 [Unity][Op] Fix Strided Slice Shape Inference (#14324)
    omit 0b47f0bfe3 [Unity][Transform] DefaultSchedule pass (#14266)
    omit 1d35ef2135 [Unity][Lint] Fix cpplint casting (#14333)
    omit 795568e148 [Unity][Transform] Automatic Mixed Precision (#14242)
    omit 3ca23b3378 [Unity][Transform] Simple Dead Code Elimination (#14262)
    omit fdf86e4c3e [Unity][Transform] Automatic Layout Conversion (#14257)
    omit 5529830bf5 [Unity][TOPI] fp16 LayerNorm & GroupNorm (#14264)
    omit e2b1d93591 [Unity][Contrib] Introduce several features of cutlass 
profiler (#14275)
    omit 850e549b32 [Unity][Transform] Enhance RewriteDataflowReshape transform 
(#14265)
    omit 981a822bd3 [Unity][BYOC] Improve expressiveness of the pattern check 
function in FuseOpsByPattern (#14310)
    omit 10b834f887 [Unity][BYOC] Support matmul + residual block fusion in 
CUTLASS BYOC (#14317)
    omit f5ee09795f [Unity] Support pattern-based rewriting (#14312)
    omit 001f17814c [Unity][Web] WebGPU explicit max buffer size (#14321)
    omit b10498b51c [Unity][Op] Enable special dimension value 0 in reshape 
(#14311)
    omit 765375f187 [Unity][Pass] Add a pass to alter the TIR implementation of 
an operator (#14215)
    omit 15de2c2df7 [Unity][DEBUG] Add Instrument (#14302)
    omit c1783b83a7 [Unity][Op] Cumsum (#14297)
    omit c7f40bd1d9 [Unity] Fix StructInfo Infer for `vm.alloc_tensor` (#14283)
    omit 8c34de2d7f [Unity] Mark tests that need python3.8 compact.
    omit 581889aa6c [TVMScript][Unity] Improve PyLint Compatibility (#14276)
    omit e6f3db185a [Unity][ci] Use CPU-SMALL instances (#14256)
    omit 7f89e22406 [Unity] Introduce call_dps_packed (#14183)
    omit 72c9510ae4 [Unity] Consider target context for Relay to Relax 
conversion (#14269)
    omit 1456f99a26 [Unity][Frontend] Import `tanh` and fix `layer_norm` 
(#14247)
    omit 6ae1c52610 [Unity][BYOC] Add conv2d and residual block patterns for 
Relax cutlass BYOC (#14252)
    omit 24b8e7bef5 [Unity] Allow user defined func attrs in emit_te (#14255)
    omit f5b6ac8fb4 [Unity][Op] Add repeat, tile, conv2d_transpose, avg_pool2d 
(#14238)
    omit 9b757c9f39 [Unity][Op][Tweak] Improve `StructInfo` inference for 
`shape_of` (#14243)
    omit 4c90f052f6 [Unity][WEB] Improve ndarray cache (#14236)
    omit 81c38c5e1b [Unity][WEB] Update text prompts for syntactical 
correctness (#14237)
    omit b270be88fe [Unity][TVMScript] Fix prim_func lost issue in 
relax.emit_te (#14189)
    omit 198caa55d1 [Unity][TVMScript] Enable Context-Aware Parsing (#14234)
    omit f2804d15f7 [Unity][Bugfix] Do not include `PrimFunc`s in the 
dependency graph when checking for recursion (#14228)
    omit 13c8c673ba [Unity][Transform] SimplifyNormInference (#14221)
    omit 6b75a40036 [Unity] Improve implementation of FuseOps (#14229)
    omit afdf218125 [Unity] ensure memory.alloc_tensor/storage roundtrippable 
(#14226)
    omit 2531c7eaf5 [Unity][WEB] Simplify WebGPU Codegen per spec (#14225)
    omit 84c20b3abe [Unity][Transform] Memory plan across the IRModule (#14220)
    omit 6cb1fe7a94 [Unity][BYOC] Add dynamic shape support to CUTLASS matmul 
(#14216)
    omit 841f8a0c03 [Unity][Frontend] from_fx keeps parameters in order (#14214)
    omit 1ffc31777e [Unity][WEB] Improve webgpu codegen options to skip 
readonly (#14213)
    omit c4225052e9 [Unity][Frontend] FX translator supports unwrapping unit 
return tuple (#14212)
    omit 399d9daf71 [Unity][Frontend] Attach imported model weights, deprecate 
ImporterOutput (#14211)
    omit b7193cf056 [Unity] Introduce Default GPU Schedule Pass (#14182)
    omit 044080ff93 [Unity][Frontend] FX translator support torch.baddbmm 
(#14202)
    omit 3b7db40860 [Unity][TIR][Pass] ForceNarrowIndexToInt32 (#14203)
    omit fb6b1ea299 [Unity][Fix] FX translating dtype (#14201)
    omit ec6e26827b [Unity][Frontend] FX translator returning weights with 
`keep_params_as_input` (#14197)
    omit 439ec78118 [Unity][Frontend] FX translator supporting more ops (#14196)
    omit 6731783749 [Unity][Op] Legalize `round`, `floor`, `ceil`, `sign` 
(#14198)
    omit 50c1e7a147 [Unity][Op] Argmax and argmin (#14195)
    omit 1927d7d4aa [Unity][Op] Group normalization (#14194)
    omit cd88b0ab49 [Unity][Transform] LiftTransformParams handling multiple 
functions (#14192)
    omit 281cc206cc [Unity][WEBGPU] Codegen improvements and WebRuntime (#14187)
    omit 8a46c21e33 [Unity][OP] Add an operator for fused multi head attention 
(#14150)
    omit 1cc9bb014e [Unity][Analysis] Restore Python bindings for var analyses 
(#14180)
    omit be532f28f2 [Unity][Op] Full support of Relax op `power` (#14171)
    omit 1bbe881241 [Unity][BYOC] Add batch matmul support to Relax CUTLASS 
BYOC (#14166)
    omit c56b17f4f6 [Unity][Analysis] Analysis for detecting recursion in Relax 
(#14149)
    omit 7cad6ef8d8 [Unity] Add bind_constants option to FuseOpsByPattern 
(#14151)
    omit ef0f4481cf [Unity][BYOC] Use Relax legalize + CPU build for reference 
in tests (#14162)
    omit 1cdc3d336b [Unity][Analysis] Checking function return struct info in 
well-formed check (#14155)
    omit 7e96a3aeed [Unity][Pass] Support Symbolic Shape Deduction during 
BindParam (#14154)
    omit 82bfc57772 [Unity][Debugging] AST printer (#14152)
    omit 43c5f29813 [Unity][Pass] Enhance constant folding to fold relax ops by 
evaluating them. (#14146)
    omit 019ef59f2e [Unity][Legalize] Fix Scalar Constant Legalization (#14127)
    omit 3bdd8013c1 [Unity] Add callback to FuseOpsByPattern to check match 
result is accepted (#14109)
    omit a16021ace5 [Unity][BYOC] Assign group to unused bindings and ignroe 
PrimFunc (#14139)
    omit 1004bdf02a [Unity][TVMScript] emit_te sugar (#14123)
    omit 5939a6e8c8 [Unity][BYOC] Add transposed matmul support to Relax 
CUTLASS BYOC (#14128)
    omit fa47ee995f [Unity] Add Global info (#14132)
    omit 133b4acaeb [Unity][WEB] Relax vm on web runtime (#14131)
    omit cf9beab753 [Unity][BlockBuilder] Add `name_hint` argument for `emit` 
and `emit_output` (#14126)
    omit 4e5e81a1b8 [Unity][Fix] Fix bug in MergeCompositeFunctions (#14117)
    omit fbf56475d2 [Unity] Update tests again to adapt to latest TVMScript 
syntax (#14115)
    omit b168949441 [Unity][BYOC]Add relax backend pattern registry (#14106)
    omit a31c856de7 [Unity] Remove attributes of relax.print, assert and unique 
(#14101)
    omit 9df23f7c67 [Unity][Layout] Add layout transformation analysis for 
PrimFunc (#14066)
    omit 70c8debc7a [Unity] Relax Recursive function (#14092)
    omit 03799a50cb [Unity] Lower `shape_of` to a builtin (#14093)
    omit 0eff29a505 [Unity] Fix typo in the comment (#14096)
    omit f15b80a561 [Unity][Relax] Set Shape Function to Be Host Function 
(#14090)
    omit 99f6d67dd0 [Unity] Refactor Relax Build JIT UX (#14088)
    omit 22c7b75834 [Unity][Fix][Pass] FoldConstant with DCE in dataflow block 
(#14087)
    omit 5aecfe4121 [Unity][Analysis] TIR pattern kind analysis for 
multi-buffer write block (#14075)
    omit 3a0f4c5eca [Unity][Op] `log_softmax` and `cross_entropy_with_logits` 
(#14083)
    omit f3ee944a58 [Unity][BYOC] Add DNNL backend (#14082)
    omit fe7e0651ec [Unity][BYOC] Add CUTLASS backend (#14081)
    omit d5fa61fd46 [Unity] Add testcases for `expr_args_converter` (#14080)
    omit 4b3794c24a [Unity][Pass] Canonicalize Bindings (#14079)
    omit 6ba5cac678 [Unity][BYOC][Pass] RunCodegen and TensorRT  (#14078)
    omit 51b1ce1ec7 [Unity][Transform] Add LiftTransformParams pass (#14069)
    omit 466a004d6c [Unity][Frontend] Annotate number of non-static input of FX 
function (#14067)
    omit 0bd303c7c1 [Unity][BYOC] Add pass to merge composite functions to 
offload large subgraphs (#14062)
    omit 7be4441569 [Unity][Pass] Remove Unused Function (#14061)
    omit 828edeb5ea [Unity][Fix][Pass] Fix FuseOps for lack graph edges (#14058)
    omit ff7d4950e0 [Unity] Relax op: collapse sum (#14059)
    omit 591b800bfa [Unity][BYOC] Add pattern-based partitioning pass (#14054)
    omit 2bd1581596 [Unity][VM] Add per-op profiling support  (#14053)
    omit 81169f6576 [Unity][TVMScript] Overload `__neg__` for relax expr 
(#14045)
    omit ac5bf3a76a [Unity][Pass] FuseOps FuseTIR fixes (#14044)
    omit f428a4ae23 [Unity] Statement rewriter for DataflowBlock (#14043)
    omit 02cefd91a5 [Unity] Relax dataflow pattern language (matching) (#14041)
    omit d3494933fe [Unity] Update tests to adapt to latest TVMScript syntax 
(#14039)
    omit 837a557210 [Unity] Disallow inline prim_func in relax IR (#14040)
    omit 96a9b6e4d8 [Unity][Pass] Block-level static memory planning (#14038)
    omit 85477ac489 [Unity] Initial PyTorch Frontend (#14037)
    omit 130e362430 [Unity][Op] Add ShapeExpr Tests for Reshape Op (#14035)
    omit 814eb921c2 [Unity][Pass] Operator legalization (#14029)
    omit 5e2f2b9d43 [Unity][TVMScript] Move tir/relax import in script out of 
__init__.py (#14033)
    omit b5d2304029 [Unity][Pass] Wellformed Analysis (#14032)
    omit 251a062bf1 [Unity][BlockBuilder] CallTE convert PrimValue args  
(#14028)
    omit 0d91c33103 [Unity][Pass] Normalize Pass (#14031)
    omit 388941ad6b [Unity] Relay -> Relax translator  (#14026)
    omit 87659ea3ea [Unity][Pass][TuningAPI] Introduce TuningAPI and 
MetaSchedule pass (#14014)
    omit 63e2402358 [Unity][Pass] BindParams pass, FoldConstant pass (#14016)
    omit 0e2bb802bb [Unity][VM] Supporting "compiled" exec mode. (#14015)
    omit e78d523e74 [Unity][Pass] LambdaLift pass (#14012)
    omit 20adb37493 [Unity][Pass] Operator Fusion Passes (#14001)
    omit 62daae4457 [Unity] NestedMsg Support utility (#13995)
    omit 4ab73eabc3 [Unity] Relax op: manipulation (#13989)
    omit 7a8765d819 [Unity] Relax op: search (#13992)
    omit 9694c673bb [Unity] Relax op: linear algebra (#13988)
    omit 4dd591b800 [Unity] Relax op: creation (#13984)
    omit a96f2006a6 [Unity] Relax op: neural networks (#13993)
    omit a6a2e84ca9 [Unity] Relax op: statistical (#13991)
    omit 5385d6d635 [Unity] Relax op: arithmetic, comparison (#13983)
    omit 42409202db [Unity] Relax op: image (#13994)
    omit c7a57aecd6 [Unity] Relax op: set (#13990)
    omit c8a153314d [Unity] Relax op: datatype (#13986)
    omit de164d2524 [Unity] Relax op: index (#13987)
    omit 75eecf7dd9 [Unity][TVMScript] Use explicit `R.shape` in TVMScript 
(#13979)
    omit fadbb3f256 [Unity] e2e Relax minimum build flow (#13961)
    omit 2c158714cf [Unity] Relax VM shape lowering pass (#13956)
    omit ea6cc94c8d [Unity] Relax VM codegen (#13954)
    omit e38a3360f5 [Unity] Relax TVMScript Printer (#13944)
    omit 2001903486 [Unity] Relax TVMScript Parser. (#13932)
    omit d1ad4e6543 [Unity] Relax BlockBuilder and ExprMutator (#13926)
    omit 6915444b2d [Unity] Basic StructInfo Analysis and Expr construction 
(#13916)
    omit fb90fd1a46 [Unity][CI] Unity specific jenkins setup (do not upstream 
to main) (#13910)
    omit 4e659d1f26 [Unity][IR] First-class StructInfo (#13907)
    omit 2c7f480f4f [Unity] Relax expressions and types (#13901)
    omit b8e4110467 [Unity] Relax VM (#13878)
     add 36b30974a9 [MetaSchedule] Introducing MemHammer (#14164)
     add 7f6da09052 [TIR] Fix Datatype in Lower TVM Builtin (#14347)
     add 4819300803 [CI][Lint] Update black (#14346)
     add 50b3ae4877 [TIR] [Analysis] Expose IsOutputBlock to python (#14352)
     add d4ca123afc [BugFix] Support rewrite_once when the number of callbacks 
> 1 (#14344)
     add 5abcf72147 [COMMUNITY] janetsc -> Reviewer (#14359)
     add 46fb2ff35f Hexagon compilation on MacOS system (#14308)
     add 0c2dd47286 [CI] Update GPU image for CUDA 11.7  (#14363)
     add c7970ddd79 [TensorIR] New schedule primitive `set_dtype` (#14316)
     add 91428158f2 [microTVM]Add MLPerfTiny test harness  (#14309)
     add 10a12bacb8 [CI][EZ] Upgrade CI Lint Image (#14373)
     add b56d7f56ab [TIR][Utility] More flexible tir::Substitute arguments 
(#14251)
     add 3b274aa6c7 [Hexagon] Allow scalar tensors to have null shape during 
allocation (#14376)
     add 3f56a95b87 [TVMScript] Use new variable frame in If/Then/Else (#14250)
     add e5ae4347dd [CUDA][Schedule] Better Layout Transform Schedules (#14167)
     add b987556375 [TIR] Remove LoadNode and StoreNode (#14381)
     add 67597025e7 [TVMScript][Fix] Fix `bool` printing for roundtrip (#14390)
     add ad6fbec066 [TIR] Improved error message in InjectSoftwarePipeline 
(#14391)
     add b09e72b54b [TIR] Legalize dtype of constants in IndexMap (#14385)
     add 4a2a3b5669 [TIR] Improved MakePackedAPI error message (#14387)
     add c5075dc30f [TIR] not estimating the flops when there is a default 
estimated flops as attr (#14379)
     add 0d0d2f0bd3 [CI][microTVM] Enable USE_MICRO for mac and windows CI 
builds (#14393)
     add 6c34361369 [Hexagon] Adapt some intrinsics for high vector lanes 
(#14345)
     add 6e70e79162 [microNPU] Upgrade Vela to v3.7.0 (#14374)
     add 30bf013e78 [TIR][Schedule] Add unittest for read_write_at (#14395)
     add da8335378a [TVMC][microNPU] tvmc option for printing which operators 
are offloaded to Ethos-U (#13212)
     add a0edf24c60 [TIR] Refactor BF16Legalize (#14405)
     add 14ddb37d14 [MetaSchedule][Hexagon] Improve vectorization for 
standalone elementwise op (#14408)
     add b3a5e18f6f [TVMScript] Improved error message for unexpected top frame 
(#14399)
     add 0ded2132e6 [skip ci] Replace magic_wand model with micro_speech 
(#14414)
     add 0e28541149 [microTVM] Update poetry to fix security issues (#14429)
     add 9f6ce7cbf9 [relay][frontend][pytorch]Fix a bug in the 
_get_pytorch_value_type function (#14421)
     add 5cca18bb07 [Frontend] Add ONNX importer for QLinearSoftmax (#14425)
     add 4011280b16 [OpenCL][Textures] Always use SSA for texture loading  
(#14397)
     add 79027f92ac [TIR] Remove special-casing of T.address_of in the storage 
rewrite pass (#14430)
     add fafe39ddab [Analysis] Improve error message in VerifyWellFormed 
(#14389)
     add 1d1dbebc73 [microTVM]Fix more security issues with pyproject (#14434)
     add cbe068cfac [TIR] Update LowerTVMBuiltin to use Optional<T> (#14400)
     add 8e2382eea5 [Bugfix] Conv3Dtranspose default kernel layout should be 
IODHW (#14340)
     add ffc1fc0116 [TVMC] Allow selecting a subset of tasks to be used in 
`tvmc tune` (#12525)
     add 7b34a6e0c6 [Runtime] Introduce runtime module property (#14406)
     add 776cf5b3b1 [Typo] Fix name of iter var type 4 (#14436)
     add 683e7a4555 [TOPI] Add instance_norm operator (#14410)
     add 221215bf60 [ETHOSN] Remove requantize dependency on resize (#14422)
     add 41fb9f41d4 [CL] Update Compute Library from v22.11 to v23.02.1 (#14426)
     add 70399da0a2 [TFLite] Support for BATCH_MATMUL tflite operator (#14423)
     add 7831a79f7f [Hexagon] Fix deprecated call for data layout size in bits 
(#14438)
     add b724c87f76 [MetaSchedule][ARM] Enable ARM CPU intrinsic for 
MetaSchedule (#14209)
     add 98007f90d8 [Relay] Move pad value extraction past null pointer check 
(#14445)
     add 49e6695586 [CI] Add llvm-15 and mlir-15 to Docker setup (#14303)
     new a27451755f [Unity] Relax VM (#13878)
     new 0117a28d22 [Unity] Relax expressions and types (#13901)
     new 2bb2e4bf75 [Unity][IR] First-class StructInfo (#13907)
     new f6b68ab7fd [Unity][CI] Unity specific jenkins setup (do not upstream 
to main) (#13910)
     new a7086616d7 [Unity] Basic StructInfo Analysis and Expr construction 
(#13916)
     new 23a7cd1a21 [Unity] Relax BlockBuilder and ExprMutator (#13926)
     new 63de0dacbd [Unity] Relax TVMScript Parser. (#13932)
     new a2d032494f [Unity] Relax TVMScript Printer (#13944)
     new 7f1e1f5528 [Unity] Relax VM codegen (#13954)
     new afe71010ef [Unity] Relax VM shape lowering pass (#13956)
     new dbedbb25ba [Unity] e2e Relax minimum build flow (#13961)
     new 4051a69cec [Unity][TVMScript] Use explicit `R.shape` in TVMScript 
(#13979)
     new caddedb418 [Unity] Relax op: index (#13987)
     new 4dfa36202b [Unity] Relax op: datatype (#13986)
     new 9a9e4a7823 [Unity] Relax op: set (#13990)
     new a9a561b472 [Unity] Relax op: image (#13994)
     new c534c9c7b3 [Unity] Relax op: arithmetic, comparison (#13983)
     new ec110c6023 [Unity] Relax op: statistical (#13991)
     new 5b3239ad4d [Unity] Relax op: neural networks (#13993)
     new 444d420450 [Unity] Relax op: creation (#13984)
     new bf6e2a9ef6 [Unity] Relax op: linear algebra (#13988)
     new 044f3bbc41 [Unity] Relax op: search (#13992)
     new f64e91c6da [Unity] Relax op: manipulation (#13989)
     new 26b4439cf1 [Unity] NestedMsg Support utility (#13995)
     new 18ade5f8ba [Unity][Pass] Operator Fusion Passes (#14001)
     new 7de9c82626 [Unity][Pass] LambdaLift pass (#14012)
     new 5a6579e1b0 [Unity][VM] Supporting "compiled" exec mode. (#14015)
     new f81e198ed4 [Unity][Pass] BindParams pass, FoldConstant pass (#14016)
     new 792d7c5eda [Unity][Pass][TuningAPI] Introduce TuningAPI and 
MetaSchedule pass (#14014)
     new 44b636f9be [Unity] Relay -> Relax translator  (#14026)
     new d8a6d1d826 [Unity][Pass] Normalize Pass (#14031)
     new 2cc122cd24 [Unity][BlockBuilder] CallTE convert PrimValue args  
(#14028)
     new a50cdd06e3 [Unity][Pass] Wellformed Analysis (#14032)
     new bd8fb78ac4 [Unity][TVMScript] Move tir/relax import in script out of 
__init__.py (#14033)
     new db588383bf [Unity][Pass] Operator legalization (#14029)
     new 317634bc19 [Unity][Op] Add ShapeExpr Tests for Reshape Op (#14035)
     new 8d575f2a73 [Unity] Initial PyTorch Frontend (#14037)
     new 9879fbbd0b [Unity][Pass] Block-level static memory planning (#14038)
     new db1bf6b039 [Unity] Disallow inline prim_func in relax IR (#14040)
     new c45b1a6990 [Unity] Update tests to adapt to latest TVMScript syntax 
(#14039)
     new 0525e05aaf [Unity] Relax dataflow pattern language (matching) (#14041)
     new 969047780a [Unity] Statement rewriter for DataflowBlock (#14043)
     new 80c474fbf1 [Unity][Pass] FuseOps FuseTIR fixes (#14044)
     new 8bad813c99 [Unity][TVMScript] Overload `__neg__` for relax expr 
(#14045)
     new b23e18c228 [Unity][VM] Add per-op profiling support  (#14053)
     new 9b1948d0ba [Unity][BYOC] Add pattern-based partitioning pass (#14054)
     new 3097f6648f [Unity] Relax op: collapse sum (#14059)
     new daa3184b29 [Unity][Fix][Pass] Fix FuseOps for lack graph edges (#14058)
     new 5f15d3a5fb [Unity][Pass] Remove Unused Function (#14061)
     new e6fdfc6075 [Unity][BYOC] Add pass to merge composite functions to 
offload large subgraphs (#14062)
     new 575fee9bb3 [Unity][Frontend] Annotate number of non-static input of FX 
function (#14067)
     new ac49e71881 [Unity][Transform] Add LiftTransformParams pass (#14069)
     new 183e4e1d84 [Unity][BYOC][Pass] RunCodegen and TensorRT  (#14078)
     new abdfe98d85 [Unity][Pass] Canonicalize Bindings (#14079)
     new 418eaf0b6b [Unity] Add testcases for `expr_args_converter` (#14080)
     new 1774d2229c [Unity][BYOC] Add CUTLASS backend (#14081)
     new 394f1261a5 [Unity][BYOC] Add DNNL backend (#14082)
     new cb7e29f7de [Unity][Op] `log_softmax` and `cross_entropy_with_logits` 
(#14083)
     new b5e6048361 [Unity][Analysis] TIR pattern kind analysis for 
multi-buffer write block (#14075)
     new fa0f49a6a7 [Unity][Fix][Pass] FoldConstant with DCE in dataflow block 
(#14087)
     new c728978f51 [Unity] Refactor Relax Build JIT UX (#14088)
     new 111dd1f6f5 [Unity][Relax] Set Shape Function to Be Host Function 
(#14090)
     new 3e139b0a93 [Unity] Fix typo in the comment (#14096)
     new 1ea40509c9 [Unity] Lower `shape_of` to a builtin (#14093)
     new 35331cdea2 [Unity] Relax Recursive function (#14092)
     new dd00671ae3 [Unity][Layout] Add layout transformation analysis for 
PrimFunc (#14066)
     new 7ac87251d0 [Unity] Remove attributes of relax.print, assert and unique 
(#14101)
     new c973eae56c [Unity][BYOC]Add relax backend pattern registry (#14106)
     new de2e70778e [Unity] Update tests again to adapt to latest TVMScript 
syntax (#14115)
     new 81a6438bc7 [Unity][Fix] Fix bug in MergeCompositeFunctions (#14117)
     new 631e483330 [Unity][BlockBuilder] Add `name_hint` argument for `emit` 
and `emit_output` (#14126)
     new 17d8625a73 [Unity][WEB] Relax vm on web runtime (#14131)
     new e85a1909db [Unity] Add Global info (#14132)
     new 8c1d87a46c [Unity][BYOC] Add transposed matmul support to Relax 
CUTLASS BYOC (#14128)
     new a5fbbd573f [Unity][TVMScript] emit_te sugar (#14123)
     new 5f5638c05a [Unity][BYOC] Assign group to unused bindings and ignroe 
PrimFunc (#14139)
     new 4892b763b9 [Unity] Add callback to FuseOpsByPattern to check match 
result is accepted (#14109)
     new 993c37d3c2 [Unity][Legalize] Fix Scalar Constant Legalization (#14127)
     new 016b2800a1 [Unity][Pass] Enhance constant folding to fold relax ops by 
evaluating them. (#14146)
     new 832c1ba04c [Unity][Debugging] AST printer (#14152)
     new 78af3acde3 [Unity][Pass] Support Symbolic Shape Deduction during 
BindParam (#14154)
     new 1d60a6a337 [Unity][Analysis] Checking function return struct info in 
well-formed check (#14155)
     new 96d85b2da5 [Unity][BYOC] Use Relax legalize + CPU build for reference 
in tests (#14162)
     new 70e925c8de [Unity] Add bind_constants option to FuseOpsByPattern 
(#14151)
     new 5f4a11a284 [Unity][Analysis] Analysis for detecting recursion in Relax 
(#14149)
     new d50be1cdf6 [Unity][BYOC] Add batch matmul support to Relax CUTLASS 
BYOC (#14166)
     new fb3e269c71 [Unity][Op] Full support of Relax op `power` (#14171)
     new 031e380c47 [Unity][Analysis] Restore Python bindings for var analyses 
(#14180)
     new 6c3a97c71c [Unity][OP] Add an operator for fused multi head attention 
(#14150)
     new 9ade1be9f7 [Unity][WEBGPU] Codegen improvements and WebRuntime (#14187)
     new d68bfb97ee [Unity][Transform] LiftTransformParams handling multiple 
functions (#14192)
     new 32049d825b [Unity][Op] Group normalization (#14194)
     new 694da73413 [Unity][Op] Argmax and argmin (#14195)
     new 012dacec71 [Unity][Op] Legalize `round`, `floor`, `ceil`, `sign` 
(#14198)
     new 5bafde482d [Unity][Frontend] FX translator supporting more ops (#14196)
     new 1896823417 [Unity][Frontend] FX translator returning weights with 
`keep_params_as_input` (#14197)
     new 2c75602cb4 [Unity][Fix] FX translating dtype (#14201)
     new 1978e44971 [Unity][TIR][Pass] ForceNarrowIndexToInt32 (#14203)
     new 03e413ae43 [Unity][Frontend] FX translator support torch.baddbmm 
(#14202)
     new f7ccc3bc59 [Unity] Introduce Default GPU Schedule Pass (#14182)
     new 4920cd26df [Unity][Frontend] Attach imported model weights, deprecate 
ImporterOutput (#14211)
     new 58e224f8b1 [Unity][Frontend] FX translator supports unwrapping unit 
return tuple (#14212)
     new 45a54f3a38 [Unity][WEB] Improve webgpu codegen options to skip 
readonly (#14213)
     new 7a4bdcde3c [Unity][Frontend] from_fx keeps parameters in order (#14214)
     new 6de29c50a2 [Unity][BYOC] Add dynamic shape support to CUTLASS matmul 
(#14216)
     new 4c39c31767 [Unity][Transform] Memory plan across the IRModule (#14220)
     new a3f40a7635 [Unity][WEB] Simplify WebGPU Codegen per spec (#14225)
     new a6d9601595 [Unity] ensure memory.alloc_tensor/storage roundtrippable 
(#14226)
     new 544b0821ae [Unity] Improve implementation of FuseOps (#14229)
     new 80fce8db81 [Unity][Transform] SimplifyNormInference (#14221)
     new 6ca3325a73 [Unity][Bugfix] Do not include `PrimFunc`s in the 
dependency graph when checking for recursion (#14228)
     new 2a32d64ef1 [Unity][TVMScript] Enable Context-Aware Parsing (#14234)
     new fba4b6bc50 [Unity][TVMScript] Fix prim_func lost issue in 
relax.emit_te (#14189)
     new 556b542611 [Unity][WEB] Update text prompts for syntactical 
correctness (#14237)
     new 3bddee1524 [Unity][WEB] Improve ndarray cache (#14236)
     new ac82cf8b0c [Unity][Op][Tweak] Improve `StructInfo` inference for 
`shape_of` (#14243)
     new 3cb9e263b9 [Unity][Op] Add repeat, tile, conv2d_transpose, avg_pool2d 
(#14238)
     new 77695deec6 [Unity] Allow user defined func attrs in emit_te (#14255)
     new 71899e5529 [Unity][BYOC] Add conv2d and residual block patterns for 
Relax cutlass BYOC (#14252)
     new 08e2a69efc [Unity][Frontend] Import `tanh` and fix `layer_norm` 
(#14247)
     new 96cd5b5b4e [Unity] Consider target context for Relay to Relax 
conversion (#14269)
     new d268b13cac [Unity] Introduce call_dps_packed (#14183)
     new 97b429a256 [Unity][ci] Use CPU-SMALL instances (#14256)
     new 2ce4af3e0c [TVMScript][Unity] Improve PyLint Compatibility (#14276)
     new df7f510da8 [Unity] Mark tests that need python3.8 compact.
     new d394b6a89f [Unity] Fix StructInfo Infer for `vm.alloc_tensor` (#14283)
     new 0f49776de3 [Unity][Op] Cumsum (#14297)
     new 1a582b9d79 [Unity][DEBUG] Add Instrument (#14302)
     new a9ca0cf0ab [Unity][Pass] Add a pass to alter the TIR implementation of 
an operator (#14215)
     new 0f6463fccb [Unity][Op] Enable special dimension value 0 in reshape 
(#14311)
     new e61576ba4b [Unity][Web] WebGPU explicit max buffer size (#14321)
     new 1a7244135f [Unity] Support pattern-based rewriting (#14312)
     new db7fdfd5fa [Unity][BYOC] Support matmul + residual block fusion in 
CUTLASS BYOC (#14317)
     new 0145fe97a4 [Unity][BYOC] Improve expressiveness of the pattern check 
function in FuseOpsByPattern (#14310)
     new 9cba9bfd7a [Unity][Transform] Enhance RewriteDataflowReshape transform 
(#14265)
     new 3e66b205d2 [Unity][Contrib] Introduce several features of cutlass 
profiler (#14275)
     new 30817d1aef [Unity][TOPI] fp16 LayerNorm & GroupNorm (#14264)
     new cdb435ccff [Unity][Transform] Automatic Layout Conversion (#14257)
     new 3b731b2eee [Unity][Transform] Simple Dead Code Elimination (#14262)
     new 3497cca0b5 [Unity][Transform] Automatic Mixed Precision (#14242)
     new 24e0fc7c69 [Unity][Lint] Fix cpplint casting (#14333)
     new aa1932492b [Unity][Transform] DefaultSchedule pass (#14266)
     new dd742a826a [Unity][Op] Fix Strided Slice Shape Inference (#14324)
     new ccb9074907 [Unity][Pass] Reuse prior infra to implement more complete 
DCE (#14334)
     new de8c12ab3c [Unity] Remove Python interface of RemoveUnusedFunction 
(#14336)
     new 029a5e8793 [Unity][BYOC] Update testcases to follow recent changes 
(#14339)
     new 602fd10694 [Unity][Frontend] FX exp and strided_slice fix (#14338)
     new d623140045 [Unity] Support model kwargs in dynamo_capture_subgraph 
(#14349)
     new a5d659099d [Unity][BYOC] Check leaked intermediate variables in 
cutlass patterns (#14350)
     new d108639bce [Unity][Transform] AMP out_dtype=float16 testcases (#14358)
     new 9b8e003d50 [Unity][Fix] Fix block memory plan to handle bool (#14357)
     new 3d7af30df7 [Unity][Transform] Introduce data-dependent operation of 
reshape and its constant folding (#14282)
     new fc8bbbd6b4 [Unity][Transform] Fix AMP tests (#14360)
     new 84dc90d76b [Unity] Add support to append relay op attrs in translator 
(#14356)
     new f38171b0cd [Unity][WEB] Support async pipeline creation (#14362)
     new 77496c33f3 [Unity][Pass] Fix FuseOps error if there is no output of a 
given group (#14354)
     new bc391d3429 [Unity][Fix] Infer Layout must support negative axes 
(#14365)
     new 5afb3ea5c5 [Unity] Add More Ops For FX Translator (#14348)
     new b5b8e206d6 [Unity][TVMScript] Update GlobalVar `checked_type_` when 
`emit_te` (#14367)
     new e32164a805 [Unity][Fix] Allow scalar layout initialization (#14370)
     new 0908a43466 [Unity] Also include output dtype in simt MathInstruction 
(#14372)
     new 5a2f1ba2c6 [Unity][VM] Add CUDA graph vm builtins (#14371)
     new 634cfad0dc [Unity] Add missing #include <array> (#14383)
     new 95b6f680b7 [Unity][Transform] SplitCallTIRByPattern and CUTLASS 
backend (#14274)
     new f6919620c1 [Unity] Support simple dynamic-shape-aware fusion (#14396)
     new 414514c1bf [Unity][Op] Add stop_lift_params (#14368)
     new 25608f40c6 [Unity][TVMScript] Fix Shape Var occurrence in Tensor 
annotation (#14404)
     new 219ed08e12 [Unity][Transform] Common Subexpression Elimination (#14361)
     new f45f11a9e5 [Unity][QNN][Hexagon]Support Relax Constants in the QNN 
TOPI operations (#14386)
     new c69c75407f [Unity][Op] Conv1d (#14388)
     new ef4057a433 [Unity] Fix getting shapes for cutlass BYOC kernels (#14411)
     new 30db3de0e7 [Unity][Op] Expose scale in `R.nn.attention` and add its 
legalize op (#14412)
     new d93eb5c091 [Unity][Hexagon] Enable Relax VM for Hexagon (#14415)
     new c5335d96f9 [Unity][Fix] Copy over module attrs in FuseTIR (#14418)
     new ab3299c054 [Unity] Handle extern func calls in static memory planning 
(#14419)
     new ac90c7af01 [Unity] Include constant shapes in the profiler result 
(#14428)
     new 784733a425 [Unity][Fix] Annotate TIR op pattern could have no stores. 
(#14420)
     new 9efc5b83a7 [Unity] Minor updates to DataFlowBlockRewrite (#14431)
     new dc7ba6c46c [Unity] Remove non-deterministic behavior from graph 
pattern matching  (#14417)
     new 41230a981f [Unity][Graph matching] Automatically add `used-by` 
constraints for `is_op` pattern (#14439)
     new cecc5c3ade [Unity][Op][Docs] Update comment for `call_tir_dyn` (#14441)
     new 646d50dc27 [Unity][Graph matching] Clean up undo stack for parent and 
child nodes properly (#14440)
     new a425bc7a39 [Unity] Pattern-based rewriting for dataflow block (#14446)

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (18c19fb830)
            \
             N -- N -- N   refs/heads/unity-staging (a425bc7a39)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 183 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 3rdparty/mlperftiny/api/submitter_implemented.h    |    2 +-
 CONTRIBUTORS.md                                    |    1 +
 apps/microtvm/cmsisnn/requirements.txt             |   10 +-
 apps/microtvm/ethosu/requirements.txt              |   10 +-
 apps/microtvm/poetry.lock                          | 3537 ++++++++++----------
 apps/microtvm/pyproject.toml                       |   56 +-
 ci/jenkins/docker-images.ini                       |    4 +-
 cmake/modules/CUDA.cmake                           |    4 +
 conda/recipe/bld.bat                               |    1 +
 conda/recipe/build.sh                              |    1 +
 docker/Dockerfile.ci_cpu                           |    3 +
 docker/Dockerfile.ci_lint                          |    2 +-
 .../ubuntu_download_arm_compute_lib_binaries.sh    |    2 +-
 docker/install/ubuntu_install_llvm_from_source.sh  |    1 +
 docker/install/ubuntu_install_vela.sh              |    2 +-
 gallery/how_to/work_with_microtvm/micro_ethosu.py  |    4 +-
 gallery/how_to/work_with_microtvm/micro_tvmc.sh    |   14 +-
 gallery/tutorial/tvmc_command_line_driver.py       |    5 +
 include/tvm/meta_schedule/schedule_rule.h          |    2 +
 include/tvm/relax/attrs/nn.h                       |   53 +
 include/tvm/relax/binding_rewrite.h                |    4 +-
 include/tvm/relax/dataflow_matcher.h               |   19 +-
 include/tvm/relax/dataflow_pattern.h               |   32 +-
 include/tvm/relax/tir_pattern.h                    |   75 +
 include/tvm/relax/transform.h                      |   38 +-
 include/tvm/relay/attrs/nn.h                       |   12 +-
 include/tvm/runtime/container/array.h              |   49 +
 include/tvm/runtime/module.h                       |   51 +-
 include/tvm/runtime/vm/executable.h                |    3 +
 include/tvm/runtime/vm/vm.h                        |    3 +
 include/tvm/tir/expr.h                             |   60 -
 include/tvm/tir/expr_functor.h                     |    4 -
 include/tvm/tir/schedule/schedule.h                |   17 +-
 include/tvm/tir/stmt.h                             |   97 +-
 include/tvm/tir/stmt_functor.h                     |  148 +-
 include/tvm/tir/transform.h                        |   16 +-
 include/tvm/topi/elemwise.h                        |    6 +-
 include/tvm/topi/nn/instance_norm.h                |   63 +
 include/tvm/topi/transform.h                       |   10 +-
 python/gen_requirements.py                         |    2 +-
 python/tvm/contrib/cutlass/attention_operation.py  |   14 +-
 python/tvm/contrib/cutlass/build.py                |   12 +-
 python/tvm/contrib/cutlass/gemm_operation.py       |    3 +-
 python/tvm/contrib/cutlass/gen_tensor_op.py        |   12 +-
 python/tvm/contrib/cutlass/library.py              |    4 +-
 python/tvm/contrib/hexagon/build.py                |   65 +-
 python/tvm/contrib/hexagon/session.py              |   38 +-
 python/tvm/contrib/hexagon/tools.py                |  198 ++
 python/tvm/driver/tvmc/autotuner.py                |  132 +-
 python/tvm/driver/tvmc/compiler.py                 |  173 +
 python/tvm/ir/json_compact.py                      |    2 -
 python/tvm/meta_schedule/schedule/cuda/__init__.py |    2 +
 .../schedule/cuda/layout_transform.py              |  583 ++++
 python/tvm/relax/__init__.py                       |    2 +
 python/tvm/relax/backend/contrib/cutlass.py        |   30 +-
 python/tvm/relax/{dpl => backend_tir}/__init__.py  |    7 +-
 .../relax/{dpl => backend_tir/contrib}/__init__.py |    5 +-
 python/tvm/relax/backend_tir/contrib/cutlass.py    |  720 ++++
 python/tvm/relax/backend_tir/pattern.py            |  576 ++++
 python/tvm/relax/dpl/__init__.py                   |    1 +
 python/tvm/relax/dpl/pattern.py                    |   58 +-
 python/tvm/relax/dpl/rewrite.py                    |  115 +
 python/tvm/relax/frontend/torch/dynamo.py          |    4 +-
 python/tvm/relax/frontend/torch/fx_translator.py   |  160 +-
 python/tvm/relax/op/base.py                        |   14 +
 python/tvm/relax/op/binary.py                      |   37 +
 python/tvm/relax/op/builtin/builtin.py             |   18 +
 python/tvm/relax/op/nn/nn.py                       |  112 +-
 python/tvm/relax/op/vm/vm.py                       |    5 +-
 python/tvm/relax/testing/relay_translator.py       |   14 +
 python/tvm/relax/transform/legalize_ops/binary.py  |    3 +
 .../tvm/relax/transform/legalize_ops/manipulate.py |    6 +-
 python/tvm/relax/transform/legalize_ops/nn.py      |   96 +
 python/tvm/relax/transform/transform.py            |   59 +-
 .../tvm/relay/analysis/operations_distribution.py  |  102 +
 .../tvm/relay/backend/contrib/ethosu/tir/passes.py |    2 +-
 .../backend/contrib/ethosu/tir_to_cs_translator.py |    1 -
 python/tvm/relay/frontend/keras.py                 |   11 +-
 python/tvm/relay/frontend/onnx.py                  |   21 +
 python/tvm/relay/frontend/pytorch.py               |    5 +-
 python/tvm/relay/frontend/tensorflow_ops.py        |    5 +-
 python/tvm/relay/frontend/tflite.py                |  147 +
 python/tvm/relay/op/_transform.py                  |    2 +-
 python/tvm/relay/op/contrib/ethosn.py              |    6 +-
 python/tvm/relay/op/nn/_nn.py                      |   45 +
 python/tvm/relay/op/nn/nn.py                       |    2 +-
 python/tvm/relay/op/strategy/cuda.py               |   11 +
 python/tvm/relay/op/strategy/generic.py            |   36 +-
 python/tvm/relay/transform/suffixes.py             |  105 +
 python/tvm/runtime/module.py                       |   50 +-
 python/tvm/script/ir_builder/relax/ir.py           |    8 +
 python/tvm/script/ir_builder/tir/ir.py             |    2 -
 python/tvm/script/parser/relax/entry.py            |   32 +-
 python/tvm/script/parser/tir/parser.py             |    6 +-
 python/tvm/tir/__init__.py                         |    3 +-
 python/tvm/tir/analysis/analysis.py                |    3 +-
 python/tvm/tir/expr.py                             |   32 +-
 python/tvm/tir/schedule/analysis.py                |   19 +
 python/tvm/tir/schedule/schedule.py                |  103 +-
 python/tvm/tir/stmt.py                             |   38 +-
 python/tvm/tir/tensor_intrin/arm_cpu.py            |   99 +-
 python/tvm/tir/transform/transform.py              |   56 +-
 python/tvm/topi/hexagon/qnn/nn.py                  |   45 +-
 python/tvm/topi/hexagon/tensor_intrin.py           |  309 +-
 python/tvm/topi/nn/__init__.py                     |    1 +
 python/tvm/topi/nn/conv3d_transpose.py             |   11 +-
 python/tvm/topi/nn/instance_norm.py                |   47 +
 python/tvm/topi/testing/__init__.py                |    1 +
 python/tvm/topi/testing/instance_norm_python.py    |   53 +
 python/tvm/topi/transform.py                       |   17 +-
 src/contrib/hybrid/codegen_hybrid.cc               |    6 -
 src/contrib/hybrid/codegen_hybrid.h                |    2 -
 src/driver/driver_api.cc                           |    4 +-
 .../feature_extractor/per_store_feature.cc         |    1 +
 .../postproc/disallow_async_strided_mem_copy.cc    |    2 +-
 .../postproc/rewrite_parallel_vectorize_unroll.cc  |   81 +-
 src/meta_schedule/postproc/verify_gpu_code.cc      |    3 +-
 src/meta_schedule/schedule_rule/schedule_rule.cc   |   90 +
 .../space_generator/space_generator.cc             |   19 +
 src/relax/analysis/tir_op_pattern_kind.cc          |    8 +
 src/relax/analysis/udchain.cc                      |   26 +-
 src/relax/analysis/well_formed.cc                  |   12 +
 src/relax/backend/vm/codegen_vm.cc                 |   11 +
 src/relax/backend/vm/vm_builtin_lower.cc           |   20 +-
 src/relax/ir/binding_rewrite.cc                    |  195 +-
 src/relax/ir/dataflow_matcher.cc                   |  271 +-
 src/relax/ir/dataflow_pattern.cc                   |    8 +-
 src/relax/ir/expr_functor.cc                       |    5 +-
 src/relax/{op/nn/attention.h => ir/tir_pattern.cc} |   26 +-
 src/relax/op/nn/attention.cc                       |   14 +-
 src/relax/op/nn/attention.h                        |    2 +-
 src/relax/op/nn/convolution.cc                     |  155 +
 src/relax/op/nn/convolution.h                      |    5 +
 src/relax/op/op.cc                                 |   43 +
 src/relax/op/op_common.h                           |   20 +
 src/relax/op/tensor/binary.cc                      |    5 +
 src/relax/op/tensor/binary.h                       |    8 +
 src/relax/op/tensor/manipulate.cc                  |    7 +-
 src/relax/op/tensor/statistical.cc                 |    2 +-
 ...orm_inference.cc => decompose_composite_ops.cc} |   51 +-
 src/relax/transform/eliminate_common_subexpr.cc    |  209 ++
 src/relax/transform/fold_constant.cc               |   70 +-
 src/relax/transform/fuse_ops.cc                    |  106 +-
 src/relax/transform/fuse_tir.cc                    |    6 +-
 src/relax/transform/infer_layout_utils.cc          |    4 +-
 src/relax/transform/lift_transform_params.cc       |   13 +
 src/relax/transform/split_call_tir_by_pattern.cc   |  782 +++++
 src/relax/transform/static_plan_block_memory.cc    |    8 +-
 src/relay/backend/aot_executor_codegen.cc          |    3 +
 src/relay/backend/build_module.cc                  |    3 +
 .../backend/contrib/cmsisnn/extract_constants.cc   |    1 +
 src/relay/backend/contrib/cmsisnn/fuse_pads.cc     |    3 +-
 .../backend/contrib/cmsisnn/generate_constants.cc  |   12 +-
 .../contrib/cmsisnn/scalar_to_tensor_constant.cc   |    5 +-
 src/relay/backend/contrib/ethosn/ethosn_api.cc     |   45 +-
 src/relay/backend/contrib/ethosu/source_module.cc  |    3 +-
 src/relay/backend/graph_executor_codegen.cc        |    3 +
 src/relay/backend/vm/compiler.h                    |    3 +
 src/relay/ir/dataflow_matcher.cc                   |   37 +-
 src/relay/op/nn/convolution.cc                     |   18 +-
 src/relay/printer/model_library_format_printer.cc  |    3 +
 src/relay/printer/text_printer.h                   |    2 -
 src/relay/printer/tir_text_printer.cc              |   19 -
 src/relay/printer/tvmscript_printer.cc             |   26 -
 src/relay/transforms/annotate_target.cc            |    1 +
 src/relay/transforms/fold_explicit_padding.cc      |    2 +-
 src/runtime/aot_executor/aot_executor.h            |    3 +
 src/runtime/aot_executor/aot_executor_factory.h    |    3 +
 src/runtime/const_loader_module.cc                 |    3 +
 src/runtime/contrib/coreml/coreml_runtime.h        |    5 +
 src/runtime/contrib/dnnl/dnnl_json_runtime.cc      |    7 +-
 src/runtime/contrib/ethosn/ethosn_runtime.h        |    5 +
 src/runtime/contrib/json/json_runtime.h            |    5 +
 src/runtime/contrib/libtorch/libtorch_runtime.cc   |    4 +
 src/runtime/contrib/onnx/onnx_module.cc            |    3 +
 src/runtime/contrib/tensorrt/tensorrt_runtime.cc   |    5 +
 src/runtime/contrib/tflite/tflite_runtime.h        |    3 +
 src/runtime/contrib/vitis_ai/vitis_ai_runtime.h    |    5 +
 src/runtime/cuda/cuda_module.cc                    |    5 +
 src/runtime/graph_executor/graph_executor.h        |    3 +
 .../graph_executor/graph_executor_factory.h        |    3 +
 src/runtime/hexagon/hexagon_device_api.cc          |    2 +-
 src/runtime/hexagon/hexagon_module.h               |    5 +
 src/runtime/library_module.cc                      |    5 +
 src/runtime/metadata.cc                            |    3 +
 src/runtime/module.cc                              |    6 +-
 src/runtime/opencl/opencl_common.h                 |    5 +
 src/runtime/relax_vm/builtin.cc                    |   34 +
 src/runtime/relax_vm/cuda/cuda_graph_builtin.cc    |  191 ++
 src/runtime/relax_vm/vm.cc                         |   51 +-
 src/runtime/rpc/rpc_module.cc                      |    2 +
 src/runtime/static_library.cc                      |    4 +-
 src/script/ir_builder/ir/ir.cc                     |   14 +-
 src/script/ir_builder/tir/utils.h                  |   26 +-
 src/script/printer/legacy_repr.cc                  |   27 -
 src/script/printer/tir/expr.cc                     |    6 -
 src/script/printer/tir/ir.cc                       |    3 +-
 src/script/printer/tir/stmt.cc                     |    7 -
 src/target/codegen.cc                              |    1 -
 src/target/llvm/codegen_hexagon.cc                 |    4 +-
 src/target/llvm/codegen_llvm.cc                    |   12 +-
 src/target/llvm/codegen_llvm.h                     |    2 -
 src/target/llvm/llvm_module.cc                     |    8 +-
 src/target/source/codegen_c.cc                     |    8 -
 src/target/source/codegen_c.h                      |    2 -
 src/target/source/codegen_opencl.cc                |   52 +-
 src/target/source/codegen_opencl.h                 |    6 -
 src/target/source/codegen_webgpu.cc                |    2 +
 src/target/source/source_module.cc                 |    6 +-
 src/target/stackvm/codegen_stackvm.cc              |    8 -
 src/target/stackvm/codegen_stackvm.h               |    2 -
 src/target/target_kind.cc                          |    8 +-
 src/te/autodiff/jacobian.cc                        |    1 -
 src/te/operation/create_primfunc.cc                |    2 +-
 src/te/operation/cross_thread_reduction.cc         |    1 +
 src/te/operation/hybrid_op.cc                      |    4 +-
 src/te/operation/op_utils.cc                       |   16 -
 src/te/operation/op_utils.h                        |   16 -
 src/tir/analysis/block_access_region_detector.cc   |   10 -
 src/tir/analysis/buffer_access_lca_detector.cc     |    9 -
 src/tir/analysis/device_constraint_utils.cc        |   18 -
 src/tir/analysis/estimate_flops.cc                 |   11 +-
 src/tir/analysis/side_effect.cc                    |    5 -
 src/tir/analysis/var_touch.cc                      |    8 -
 src/tir/analysis/var_use_def_analysis.cc           |    8 -
 src/tir/analysis/var_use_def_analysis.h            |    4 -
 src/tir/analysis/verify_gpu_code.cc                |    8 -
 src/tir/analysis/verify_memory.cc                  |    8 -
 src/tir/analysis/verify_well_formed.cc             |   35 +-
 src/tir/ir/expr.cc                                 |   70 +-
 src/tir/ir/expr_functor.cc                         |    8 -
 src/tir/ir/index_map.cc                            |    2 +-
 src/tir/ir/stmt.cc                                 |   53 -
 src/tir/ir/stmt_functor.cc                         |   27 -
 src/tir/op/op.cc                                   |    2 +
 src/tir/schedule/analysis/analysis.cc              |    5 +
 src/tir/schedule/analysis/reducer.cc               |   18 -
 src/tir/schedule/concrete_schedule.cc              |   32 +
 src/tir/schedule/concrete_schedule.h               |    6 +
 src/tir/schedule/primitive.h                       |   21 +
 src/tir/schedule/primitive/block_annotate.cc       |  117 +
 src/tir/schedule/primitive/blockize_tensorize.cc   |    2 +-
 src/tir/schedule/primitive/cache_index.cc          |    8 +-
 src/tir/schedule/primitive/cache_read_write.cc     |   16 +-
 src/tir/schedule/primitive/compute_inline.cc       |    8 -
 .../schedule/primitive/layout_transformation.cc    |   36 +-
 src/tir/schedule/primitive/read_write_at.cc        |  421 +++
 src/tir/schedule/primitive/reduction.cc            |    8 +-
 src/tir/schedule/schedule.cc                       |    6 +
 src/tir/schedule/traced_schedule.cc                |   39 +
 src/tir/schedule/traced_schedule.h                 |    6 +
 src/tir/schedule/transform.cc                      |   10 +
 src/tir/schedule/transform.h                       |   12 +-
 src/tir/transforms/arg_binder.cc                   |    2 +-
 src/tir/transforms/bf16_legalize.cc                |  696 ++--
 src/tir/transforms/bound_checker.cc                |    8 -
 src/tir/transforms/common_subexpr_elim.cc          |    5 +-
 src/tir/transforms/compact_buffer_region.cc        |    8 -
 src/tir/transforms/coproc_sync.cc                  |    6 -
 src/tir/transforms/inject_copy_intrin.cc           |    2 +-
 src/tir/transforms/inject_double_buffer.cc         |    8 -
 src/tir/transforms/inject_software_pipeline.cc     |   16 +-
 src/tir/transforms/inject_virtual_thread.cc        |   18 +-
 src/tir/transforms/install_debug_spans.h           |    1 -
 src/tir/transforms/ir_utils.cc                     |    8 -
 src/tir/transforms/lower_cross_thread_reduction.cc |    2 +-
 src/tir/transforms/lower_custom_datatypes.cc       |    8 -
 src/tir/transforms/lower_match_buffer.cc           |   14 -
 src/tir/transforms/lower_thread_allreduce.cc       |    8 -
 src/tir/transforms/lower_tvm_builtin.cc            |  113 +-
 src/tir/transforms/lower_warp_memory.cc            |   12 -
 src/tir/transforms/make_packed_api.cc              |   10 +-
 .../manifest_shared_memory_local_stage.cc          |    2 +-
 src/tir/transforms/memhammer_coalesce.cc           |  234 ++
 src/tir/transforms/memhammer_intermediate_stage.cc |  444 +++
 src/tir/transforms/memhammer_lower_auto_copy.cc    |  779 +++++
 src/tir/transforms/memhammer_rewrite_rule.h        |  242 ++
 src/tir/transforms/memhammer_tensorcore_rewrite.cc |  350 ++
 .../merge_dynamic_shared_memory_allocations.cc     |   16 -
 src/tir/transforms/narrow_datatype.cc              |   10 +-
 src/tir/transforms/renew_defs.cc                   |    8 -
 src/tir/transforms/rewrite_unsafe_select.cc        |    3 -
 src/tir/transforms/simplify.cc                     |    4 -
 src/tir/transforms/split_host_device.cc            |    2 +-
 src/tir/transforms/storage_access.cc               |    8 -
 src/tir/transforms/storage_access.h                |    3 -
 src/tir/transforms/storage_flatten.cc              |   16 -
 src/tir/transforms/storage_rewrite.cc              |   55 +-
 src/tir/transforms/thread_storage_sync.cc          |    7 -
 src/tir/transforms/unroll_loop.cc                  |    4 -
 src/tir/transforms/update_pointer_storage_scope.cc |    8 -
 src/tir/transforms/update_pointer_storage_scope.h  |    2 -
 src/tir/transforms/vectorize_loop.cc               |   19 +-
 src/tir/usmp/analysis/extract_buffer_info.cc       |    7 +-
 src/tir/usmp/transform/create_io_allocates.cc      |    6 -
 src/topi/nn.cc                                     |    6 +
 src/topi/transform.cc                              |    2 +-
 .../hexagon/hexagon_device_api_tests.cc            |    3 +
 tests/cpp/nested_msg_test.cc                       |    1 +
 tests/lint/check_file_type.py                      |    1 +
 tests/lint/rat-excludes                            |    1 +
 tests/micro/arduino/test_utils.py                  |    2 +-
 tests/micro/common/test_autotune.py                |    2 +-
 tests/micro/common/test_mlperftiny.py              |  130 +-
 tests/micro/common/test_tvmc.py                    |    2 +-
 tests/python/contrib/test_dnnl.py                  |    2 +-
 tests/python/contrib/test_ethosn/test_resize.py    |   24 +-
 tests/python/contrib/test_ethosu/infra.py          |   11 +-
 .../test_pass_operations_distribution.py           |  173 +
 .../test_hexagon/test_fixed_point_multiply.py      |  138 +-
 .../contrib/test_hexagon/test_relax_integration.py |  236 ++
 .../test_hexagon/test_wo_qnn_canonicalization.py   |   70 +
 tests/python/driver/tvmc/test_autotuner.py         |  102 +-
 tests/python/driver/tvmc/test_compiler.py          |  351 ++
 tests/python/frontend/onnx/test_forward.py         |   33 +
 tests/python/frontend/tensorflow/test_forward.py   |   12 +-
 tests/python/frontend/tflite/test_forward.py       |   74 +-
 tests/python/integration/test_reduce.py            |    4 +-
 tests/python/relax/test_analysis_well_formed.py    |   13 +
 tests/python/relax/test_codegen_cutlass.py         |  180 +-
 tests/python/relax/test_codegen_tir_cutlass.py     |  709 ++++
 tests/python/relax/test_dataflow_pattern.py        |  231 +-
 tests/python/relax/test_frontend_dynamo.py         |  173 +-
 tests/python/relax/test_frontend_from_fx.py        |  251 +-
 tests/python/relax/test_op_binary.py               |    2 +
 tests/python/relax/test_op_manipulate.py           |   25 +-
 tests/python/relax/test_op_misc.py                 |    8 +
 tests/python/relax/test_op_nn_convolution.py       |  378 ++-
 tests/python/relax/test_relay_translator.py        |   12 +
 .../test_transform_annotate_tir_op_pattern.py      |   19 +
 .../python/relax/test_transform_convert_layout.py  |   54 +
 tests/python/relax/test_transform_cse.py           |  186 +
 ...y => test_transform_decompose_composite_ops.py} |   25 +-
 tests/python/relax/test_transform_fold_constant.py |   37 +
 tests/python/relax/test_transform_fuse_ops.py      |  169 +
 tests/python/relax/test_transform_fuse_tir.py      |    4 +-
 .../relax/test_transform_legalize_ops_binary.py    |  280 ++
 .../test_transform_legalize_ops_manipulate.py      |  195 ++
 .../python/relax/test_transform_legalize_ops_nn.py |  340 ++
 .../relax/test_transform_lift_transform_params.py  |   48 +
 .../test_transform_static_plan_block_memory.py     |   92 +
 .../relax/test_transform_to_mixed_precision.py     |  325 +-
 tests/python/relax/test_tvmscript_parser.py        |   18 +-
 .../relax/test_tvmscript_parser_op_arith_cmp.py    |    2 +
 tests/python/relax/test_tvmscript_parser_op_nn.py  |   30 +-
 tests/python/relax/test_vm_cuda_graph.py           |  108 +
 .../relay/opencl_texture/test_injection_texture.py |   85 +
 tests/python/relay/test_dataflow_pattern.py        |   79 +-
 .../python/topi/python/test_topi_instance_norm.py  |   64 +
 .../test_meta_schedule_post_order_apply.py         |   73 +
 ...e_postproc_rewrite_parallel_vectorize_unroll.py |   44 +
 .../test_meta_schedule_relay_integration.py        |    3 +
 ...meta_schedule_schedule_cuda_layout_transform.py |  466 +++
 .../unittest/test_micro_model_library_format.py    |    2 +-
 .../unittest/test_runtime_module_property.py       |   62 +
 tests/python/unittest/test_target_codegen_llvm.py  |    6 +-
 .../python/unittest/test_target_codegen_opencl.py  |    6 +-
 .../unittest/test_target_texture_codegen_opencl.py |  375 +++
 .../test_tir_analysis_estimate_tir_flops.py        |   30 +
 tests/python/unittest/test_tir_nodes.py            |    2 +-
 .../python/unittest/test_tir_schedule_analysis.py  |   21 +
 .../unittest/test_tir_schedule_compute_inline.py   |    2 +-
 .../unittest/test_tir_schedule_read_write_at.py    |  221 ++
 .../python/unittest/test_tir_schedule_set_dtype.py |  125 +
 .../unittest/test_tir_schedule_transform_layout.py |   36 +
 .../unittest/test_tir_transform_bf16_legalize.py   |  257 +-
 .../test_tir_transform_lower_tvm_builtin.py        |   19 +-
 ...test_tir_transform_memhammer_lower_auto_copy.py | 1062 ++++++
 .../unittest/test_tir_transform_storage_rewrite.py |   61 +-
 .../unittest/test_tvmscript_ir_builder_tir.py      |    2 +-
 .../python/unittest/test_tvmscript_printer_tir.py  |    2 +-
 tests/python/unittest/test_tvmscript_roundtrip.py  |   16 +
 tests/scripts/request_hook/request_hook.py         |    3 +-
 web/apps/browser/rpc_server.html                   |    6 +-
 web/emcc/webgpu_runtime.cc                         |   32 +-
 web/src/rpc_server.ts                              |   12 +-
 web/src/runtime.ts                                 |  109 +-
 web/src/webgpu.ts                                  |  210 +-
 378 files changed, 21006 insertions(+), 4254 deletions(-)
 create mode 100644 include/tvm/relax/tir_pattern.h
 create mode 100644 include/tvm/topi/nn/instance_norm.h
 create mode 100644 python/tvm/meta_schedule/schedule/cuda/layout_transform.py
 copy python/tvm/relax/{dpl => backend_tir}/__init__.py (89%)
 copy python/tvm/relax/{dpl => backend_tir/contrib}/__init__.py (88%)
 create mode 100644 python/tvm/relax/backend_tir/contrib/cutlass.py
 create mode 100644 python/tvm/relax/backend_tir/pattern.py
 create mode 100644 python/tvm/relax/dpl/rewrite.py
 create mode 100644 python/tvm/relay/analysis/operations_distribution.py
 create mode 100644 python/tvm/relay/transform/suffixes.py
 create mode 100644 python/tvm/topi/nn/instance_norm.py
 create mode 100644 python/tvm/topi/testing/instance_norm_python.py
 copy src/relax/{op/nn/attention.h => ir/tir_pattern.cc} (68%)
 rename src/relax/transform/{simplify_norm_inference.cc => 
decompose_composite_ops.cc} (69%)
 create mode 100644 src/relax/transform/eliminate_common_subexpr.cc
 create mode 100644 src/relax/transform/split_call_tir_by_pattern.cc
 create mode 100644 src/runtime/relax_vm/cuda/cuda_graph_builtin.cc
 create mode 100644 src/tir/schedule/primitive/read_write_at.cc
 create mode 100644 src/tir/transforms/memhammer_coalesce.cc
 create mode 100644 src/tir/transforms/memhammer_intermediate_stage.cc
 create mode 100644 src/tir/transforms/memhammer_lower_auto_copy.cc
 create mode 100644 src/tir/transforms/memhammer_rewrite_rule.h
 create mode 100644 src/tir/transforms/memhammer_tensorcore_rewrite.cc
 create mode 100644 
tests/python/contrib/test_ethosu/test_pass_operations_distribution.py
 create mode 100644 tests/python/contrib/test_hexagon/test_relax_integration.py
 create mode 100644 tests/python/relax/test_codegen_tir_cutlass.py
 create mode 100644 tests/python/relax/test_transform_cse.py
 rename tests/python/relax/{test_transform_simpilify_norm_inference.py => 
test_transform_decompose_composite_ops.py} (86%)
 create mode 100644 tests/python/relax/test_vm_cuda_graph.py
 create mode 100644 tests/python/relay/opencl_texture/test_injection_texture.py
 create mode 100644 tests/python/topi/python/test_topi_instance_norm.py
 create mode 100644 
tests/python/unittest/test_meta_schedule_schedule_cuda_layout_transform.py
 create mode 100644 tests/python/unittest/test_runtime_module_property.py
 create mode 100644 tests/python/unittest/test_tir_schedule_read_write_at.py
 create mode 100644 tests/python/unittest/test_tir_schedule_set_dtype.py
 create mode 100644 
tests/python/unittest/test_tir_transform_memhammer_lower_auto_copy.py

[tvm] branch unity-staging updated (18c19fb830 -> a425bc7a39)

Reply via email to